[pgpool-hackers: 4544] PGPool4.2 changing leader role and healthcheck

Tue Dec 3 02:26:23 JST 2024

Hi guys

Need you hints on some weird behaviors of PGPool 4.2.

1. I have 2 pgpool instances that watch each other and handling pgpool VIP.
I see that when a current pgpool leader comes down, the role switched and
VIP moved with significant delay. In logs I see a this picture:

2024-12-02 14:40:12: pid 1286: LOG:  watchdog node state changed from
[INITIALIZING] to [LEADER]
2024-12-02 14:40:12: pid 1286: LOG:  Setting failover command timeout to 1
2024-12-02 14:40:12: pid 1286: LOG:  I am announcing my self as
leader/coordinator watchdog node
2024-12-02 14:40:16: pid 1286: LOG:  I am the cluster leader node
2024-12-02 14:40:16: pid 1286: DETAIL:  our declare coordinator message is
accepted by all nodes
2024-12-02 14:40:16: pid 1286: LOG:  setting the local node "
10.65.188.56:9999 Linux pg-mgrdb2" as watchdog cluster leader
2024-12-02 14:40:16: pid 1286: LOG:  signal_user1_to_parent_with_reason(1)
2024-12-02 14:40:16: pid 1286: LOG:  I am the cluster leader node. Starting
escalation process
2024-12-02 14:40:16: pid 1281: LOG:  Pgpool-II parent process received
SIGUSR1
2024-12-02 14:40:16: pid 1281: LOG:  Pgpool-II parent process received
watchdog state change signal from watchdog
2024-12-02 14:40:16: pid 1286: LOG:  escalation process started with
PID:4855
2024-12-02 14:40:16: pid 4855: LOG:  watchdog: escalation started
2024-12-02 14:40:20: pid 4855: LOG:  successfully acquired the delegate
IP:"10.65.188.59"
2024-12-02 14:40:20: pid 4855: DETAIL:  'if_up_cmd' returned with success
2024-12-02 14:40:20: pid 1286: LOG:  watchdog escalation process with pid:
4855 exit with SUCCESS.

It has siginficant delays at 14:40:12 and on acquiring the VIP at 14:40:16.
The quorum settings in gpgool.conf are:

failover_when_quorum_exists=off
failover_require_consensus=on
allow_multiple_failover_requests_from_node=off

So I nave no idea why it happens.

2. The second question is about a health check logics. I get right that if
a backend comes to down state, his health check gets stopped?
If yes, how can I ensure that a failed backend comes back (after hardware
issue for example), and should be recovered?
Or it's impossible within pgpool and I should use third-party gears for
tracking backends and triggering the recovering?

BR
Igor Yurchenko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20241202/897938a6/attachment.htm>