[pgpool-hackers: 4548] Re: PGPool4.2 changing leader role and healthcheck
Bo Peng
pengbo at sraoss.co.jp
Thu Dec 5 16:19:00 JST 2024
Hi,
> 1. I have 2 pgpool instances that watch each other and handling pgpool VIP.
> I see that when a current pgpool leader comes down, the role switched and
> VIP moved with significant delay. In logs I see a this picture:
>
> It has siginficant delays at 14:40:12 and on acquiring the VIP at 14:40:16.
> The quorum settings in gpgool.conf are:
>
> failover_when_quorum_exists=off
> failover_require_consensus=on
> allow_multiple_failover_requests_from_node=off
These parameters are configured for PostgreSQL failover behavior, not for Pgpool-II leader node switchover.
If you want to reduce the time required for a Pgpool-II leader node switchover,
you can decrease the values of the parameters below:
wd_interval = 10
wd_heartbeat_deadtime = 30
> 2. The second question is about a health check logics. I get right that if
> a backend comes to down state, his health check gets stopped?
> If yes, how can I ensure that a failed backend comes back (after hardware
> issue for example), and should be recovered?
> Or it's impossible within pgpool and I should use third-party gears for
> tracking backends and triggering the recovering?
Only a failed standby node can be reattached to pgpool automatically by setting "auto_failback = on" when it recovers.
A failed primary node cannot be reattached to pgpool automatically. You need to recover it manually.
On Mon, 2 Dec 2024 19:26:23 +0200
Igor Yurchenko <harry.urcen at gmail.com> wrote:
> Hi guys
>
> Need you hints on some weird behaviors of PGPool 4.2.
>
> 1. I have 2 pgpool instances that watch each other and handling pgpool VIP.
> I see that when a current pgpool leader comes down, the role switched and
> VIP moved with significant delay. In logs I see a this picture:
>
> 2024-12-02 14:40:12: pid 1286: LOG: watchdog node state changed from
> [INITIALIZING] to [LEADER]
> 2024-12-02 14:40:12: pid 1286: LOG: Setting failover command timeout to 1
> 2024-12-02 14:40:12: pid 1286: LOG: I am announcing my self as
> leader/coordinator watchdog node
> 2024-12-02 14:40:16: pid 1286: LOG: I am the cluster leader node
> 2024-12-02 14:40:16: pid 1286: DETAIL: our declare coordinator message is
> accepted by all nodes
> 2024-12-02 14:40:16: pid 1286: LOG: setting the local node "
> 10.65.188.56:9999 Linux pg-mgrdb2" as watchdog cluster leader
> 2024-12-02 14:40:16: pid 1286: LOG: signal_user1_to_parent_with_reason(1)
> 2024-12-02 14:40:16: pid 1286: LOG: I am the cluster leader node. Starting
> escalation process
> 2024-12-02 14:40:16: pid 1281: LOG: Pgpool-II parent process received
> SIGUSR1
> 2024-12-02 14:40:16: pid 1281: LOG: Pgpool-II parent process received
> watchdog state change signal from watchdog
> 2024-12-02 14:40:16: pid 1286: LOG: escalation process started with
> PID:4855
> 2024-12-02 14:40:16: pid 4855: LOG: watchdog: escalation started
> 2024-12-02 14:40:20: pid 4855: LOG: successfully acquired the delegate
> IP:"10.65.188.59"
> 2024-12-02 14:40:20: pid 4855: DETAIL: 'if_up_cmd' returned with success
> 2024-12-02 14:40:20: pid 1286: LOG: watchdog escalation process with pid:
> 4855 exit with SUCCESS.
>
> It has siginficant delays at 14:40:12 and on acquiring the VIP at 14:40:16.
> The quorum settings in gpgool.conf are:
>
> failover_when_quorum_exists=off
> failover_require_consensus=on
> allow_multiple_failover_requests_from_node=off
>
> So I nave no idea why it happens.
>
> 2. The second question is about a health check logics. I get right that if
> a backend comes to down state, his health check gets stopped?
> If yes, how can I ensure that a failed backend comes back (after hardware
> issue for example), and should be recovered?
> Or it's impossible within pgpool and I should use third-party gears for
> tracking backends and triggering the recovering?
>
> BR
> Igor Yurchenko
--
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS K.K.
TEL: 03-5979-2701 FAX: 03-5979-2702
URL: https://www.sraoss.co.jp/
More information about the pgpool-hackers
mailing list