[pgpool-hackers: 4565] Re: PGPool 4.2.15 shows incorrect pool_nodes
Bo Peng
pengbo at sraoss.co.jp
Fri Jan 24 12:56:19 JST 2025
hi,
> Actually, I have here two questions:
> 1) Why pgpool provide incorrect states of the backends?
> 2) What is wrong with follow_primary procedure? Is there a sense to use
> follow_primary for two nodes?
If you are using two nodes, you don't need to configure follow_primary.
Try disabling follow_primary, removing the /tmp/pgpool_status file,
and restarting Pgpool.
Before starting Pgpool, ensure that the PostgreSQL backend nodes
are properly configured for streaming replication.
On Thu, 23 Jan 2025 17:27:17 +0200
Igor Yurchenko <harry.urcen at gmail.com> wrote:
> Hi guys
>
> My brain get broken. It looks like I cannot handle the issue without your
> hint.
> In my case Pgpool provides incorrect data on 'show pool_nodes':
>
> [root at pg-mgrdb1 ~]# psql -U fabrix -w -h 10.65.188.59 -p 9999 postgres -c
> 'show pool_nodes'
> node_id | hostname | port | status | lb_weight | role | select_cnt
> | load_balance_node | replication_delay | replication_state |
> replication_sync_state | last_status_change
> ---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0 | 10.65.188.55 | 5432 | up | 0.500000 | primary | 5638095
> | true | 0 | streaming | async
> | 2025-01-23 16:02:22
> 1 | 10.65.188.56 | 5432 | up | 0.500000 | standby | 213106
> | false | 0 | |
> | 2025-01-23 16:02:23
> (2 rows)
>
> [root at pg-mgrdb1 ~]#
>
> Actually, both backends are not in recovery (treated as primary), and
> respectively there is no replication between the nodes. Obviously, my
> failover/failback scripts work incorrectly, and bring the whole cluster to
> broken state.
> But why pool_nodes report doesn't match to the reality? The only point that
> shows that something is wrong is that the replication_state and
> replication_sync_state
> data (streaming, async) are shown in wrong line
> it should be printed out for standby node.The streaming check is enabled.
> sr_check_period set to 1.
> It looks like something goes wrong with autofailback. In the pgpool logs I
> see this piece:
>
> 2025-01-23 16:02:23: pid 1135: LOG: watchdog is informed of failover start
> by the main process
> 2025-01-23 16:02:23: pid 1128: LOG: starting fail back. reconnect host
> 10.65.188.56(5432)
> 2025-01-23 16:02:23: pid 1128: LOG: Node 0 is not down (status: 2)
> 2025-01-23 16:02:23: pid 1128: LOG: execute command:
> /etc/pgpool-II/recovery/failback_node.sh 1 10.65.188.56 5432
> + NODE_ID=1
> + NODE_HOST=10.65.188.56
> + NODE_PORT=5432
> + LOG_FILE=/home/postgres/pg_logs/failback.log
> ++ date
> + echo 'Thu Jan 23 16:02:23 IST 2025: Failback triggered for node 1 at
> 10.65.188.56:5432'
> + true
> + '[' 0 -eq 0 ']'
> ++ date
> + echo 'Thu Jan 23 16:02:23 IST 2025: Node 1 successfully reattached.'
> 2025-01-23 16:02:23: pid 1128: LOG: Do not restart children because we are
> failing back node id 1 host: 10.65.188.56 port: 5432 and we are in
> streaming replication mode and not all backends were down
> 2025-01-23 16:02:23: pid 1128: LOG: find_primary_node_repeatedly: follow
> primary is ongoing. return current primary: 0
> 2025-01-23 16:02:23: pid 1128: LOG: failover: set new primary node: 0
> 2025-01-23 16:02:23: pid 1128: LOG: failover: set new main node: 0
> 2025-01-23 16:02:23: pid 1135: LOG: received the failover indication from
> Pgpool-II on IPC interface
> 2025-01-23 16:02:23: pid 1135: LOG: watchdog is informed of failover end
> by the main process
> 2025-01-23 16:02:23: pid 19776: LOG: worker process received restart
> request
> 2025-01-23 16:02:23: pid 1128: LOG: failback done. reconnect host
> 10.65.188.56(5432)
> 2025-01-23 16:02:23: pid 19780: LOG: selecting backend connection
> 2025-01-23 16:02:23: pid 19780: DETAIL: failover or failback event
> detected, discarding existing connections
>
> Here is mentioned that "follow primary is ongoing". But last call for
> follow_primary.sh was pretty long time ago, so...
>
> Actually, I have here two questions:
> 1) Why pgpool provide incorrect states of the backends?
> 2) What is wrong with follow_primary procedure? Is there a sense to use
> follow_primary for two nodes?
>
> My pgpool.conf, failback/failover scripts and log are available for 5 days
> here: https://filebin.net/076qbpqicx3rffik
> I'd be highly appreciated to any advice.
>
> BR
> Igor Yurchenko
--
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS K.K.
TEL: 03-5979-2701 FAX: 03-5979-2702
URL: https://www.sraoss.co.jp/
More information about the pgpool-hackers
mailing list