[pgpool-hackers: 4565] Re: PGPool 4.2.15 shows incorrect pool_nodes

Fri Jan 24 12:56:19 JST 2025

hi,

> Actually, I have here two questions:
> 1) Why pgpool provide incorrect states of the backends?
> 2) What is wrong with follow_primary procedure? Is there a sense to use
> follow_primary for two nodes?

If you are using two nodes, you don't need to configure follow_primary.

Try disabling follow_primary, removing the /tmp/pgpool_status file, 
and restarting Pgpool.
Before starting Pgpool, ensure that the PostgreSQL backend nodes
are properly configured for streaming replication.

On Thu, 23 Jan 2025 17:27:17 +0200
Igor Yurchenko <harry.urcen at gmail.com> wrote:

> Hi guys
> 
> My brain get broken. It looks like I cannot handle the issue without your
> hint.
> In my case Pgpool provides incorrect data on 'show pool_nodes':
> 
> [root at pg-mgrdb1 ~]# psql -U fabrix -w -h 10.65.188.59 -p 9999 postgres -c
> 'show pool_nodes'
>  node_id |   hostname   | port | status | lb_weight |  role   | select_cnt
> | load_balance_node | replication_delay | replication_state |
> replication_sync_state | last_status_change
> ---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>  0       | 10.65.188.55 | 5432 | up     | 0.500000  | primary | 5638095
>  | true              | 0                 | streaming         | async
>            | 2025-01-23 16:02:22
>  1       | 10.65.188.56 | 5432 | up     | 0.500000  | standby | 213106
> | false             | 0                 |                   |
>          | 2025-01-23 16:02:23
> (2 rows)
> 
> [root at pg-mgrdb1 ~]#
> 
> Actually, both backends are not in recovery (treated as primary), and
> respectively there is no replication between the nodes. Obviously, my
> failover/failback scripts work incorrectly, and bring the whole cluster to
> broken state.
> But why pool_nodes report doesn't match to the reality? The only point that
> shows that something is wrong is that the replication_state and
> replication_sync_state
> data (streaming, async) are shown in wrong line
> it should be printed out for standby node.The streaming check is enabled.
> sr_check_period set to 1.
> It looks like something goes wrong with autofailback. In the pgpool logs I
> see this piece:
> 
> 2025-01-23 16:02:23: pid 1135: LOG:  watchdog is informed of failover start
> by the main process
> 2025-01-23 16:02:23: pid 1128: LOG:  starting fail back. reconnect host
> 10.65.188.56(5432)
> 2025-01-23 16:02:23: pid 1128: LOG:  Node 0 is not down (status: 2)
> 2025-01-23 16:02:23: pid 1128: LOG:  execute command:
> /etc/pgpool-II/recovery/failback_node.sh 1 10.65.188.56 5432
> + NODE_ID=1
> + NODE_HOST=10.65.188.56
> + NODE_PORT=5432
> + LOG_FILE=/home/postgres/pg_logs/failback.log
> ++ date
> + echo 'Thu Jan 23 16:02:23 IST 2025: Failback triggered for node 1 at
> 10.65.188.56:5432'
> + true
> + '[' 0 -eq 0 ']'
> ++ date
> + echo 'Thu Jan 23 16:02:23 IST 2025: Node 1 successfully reattached.'
> 2025-01-23 16:02:23: pid 1128: LOG:  Do not restart children because we are
> failing back node id 1 host: 10.65.188.56 port: 5432 and we are in
> streaming replication mode and not all backends were down
> 2025-01-23 16:02:23: pid 1128: LOG:  find_primary_node_repeatedly: follow
> primary is ongoing. return current primary: 0
> 2025-01-23 16:02:23: pid 1128: LOG:  failover: set new primary node: 0
> 2025-01-23 16:02:23: pid 1128: LOG:  failover: set new main node: 0
> 2025-01-23 16:02:23: pid 1135: LOG:  received the failover indication from
> Pgpool-II on IPC interface
> 2025-01-23 16:02:23: pid 1135: LOG:  watchdog is informed of failover end
> by the main process
> 2025-01-23 16:02:23: pid 19776: LOG:  worker process received restart
> request
> 2025-01-23 16:02:23: pid 1128: LOG:  failback done. reconnect host
> 10.65.188.56(5432)
> 2025-01-23 16:02:23: pid 19780: LOG:  selecting backend connection
> 2025-01-23 16:02:23: pid 19780: DETAIL:  failover or failback event
> detected, discarding existing connections
> 
> Here is mentioned that "follow primary is ongoing". But last call for
> follow_primary.sh was pretty long time ago, so...
> 
> Actually, I have here two questions:
> 1) Why pgpool provide incorrect states of the backends?
> 2) What is wrong with follow_primary procedure? Is there a sense to use
> follow_primary for two nodes?
> 
> My pgpool.conf, failback/failover scripts and log are available for 5 days
> here: https://filebin.net/076qbpqicx3rffik
> I'd be highly appreciated to any advice.
> 
> BR
> Igor Yurchenko

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS K.K.
TEL: 03-5979-2701 FAX: 03-5979-2702
URL: https://www.sraoss.co.jp/