[pgpool-hackers: 4550] Re: PGPool4.2 changing leader role and healthcheck

Wed Dec 11 07:47:58 JST 2024

HI,
Thanks for the update. i've checked the mentioned wd_* params. It do works,
thanks.
About heartbeat to dead/down backends. May be it worth to consider adding
an option that enable probing for dead/down backends and if they get alive,
trigger a script for recovering.
Anyway, thanks for your assistance.
BR
Igor Yurchenko

On Thu, 5 Dec 2024 at 09:19, Bo Peng <pengbo at sraoss.co.jp> wrote:

> Hi,
>
> > 1. I have 2 pgpool instances that watch each other and handling pgpool
> VIP.
> > I see that when a current pgpool leader comes down, the role switched and
> > VIP moved with significant delay. In logs I see a this picture:
> >
> > It has siginficant delays at 14:40:12 and on acquiring the VIP at
> 14:40:16.
> > The quorum settings in gpgool.conf are:
> >
> > failover_when_quorum_exists=off
> > failover_require_consensus=on
> > allow_multiple_failover_requests_from_node=off
>
> These parameters are configured for PostgreSQL failover behavior, not for
> Pgpool-II leader node switchover.
>
> If you want to reduce the time required for a Pgpool-II leader node
> switchover,
> you can decrease the values of the parameters below:
>
>  wd_interval = 10
>  wd_heartbeat_deadtime = 30
>
> > 2. The second question is about a health check logics. I get right that
> if
> > a backend comes to down state, his health check gets stopped?
> > If yes, how can I ensure that a failed backend comes back (after hardware
> > issue for example), and should be recovered?
> > Or it's impossible within pgpool and I should use third-party gears for
> > tracking backends and triggering the recovering?
>
> Only a failed standby node can be reattached to pgpool automatically by
> setting "auto_failback = on" when it recovers.
> A failed primary node cannot be reattached to pgpool automatically. You
> need to recover it manually.
>
>
> On Mon, 2 Dec 2024 19:26:23 +0200
> Igor Yurchenko <harry.urcen at gmail.com> wrote:
>
> > Hi guys
> >
> > Need you hints on some weird behaviors of PGPool 4.2.
> >
> > 1. I have 2 pgpool instances that watch each other and handling pgpool
> VIP.
> > I see that when a current pgpool leader comes down, the role switched and
> > VIP moved with significant delay. In logs I see a this picture:
> >
> > 2024-12-02 14:40:12: pid 1286: LOG:  watchdog node state changed from
> > [INITIALIZING] to [LEADER]
> > 2024-12-02 14:40:12: pid 1286: LOG:  Setting failover command timeout to
> 1
> > 2024-12-02 14:40:12: pid 1286: LOG:  I am announcing my self as
> > leader/coordinator watchdog node
> > 2024-12-02 14:40:16: pid 1286: LOG:  I am the cluster leader node
> > 2024-12-02 14:40:16: pid 1286: DETAIL:  our declare coordinator message
> is
> > accepted by all nodes
> > 2024-12-02 14:40:16: pid 1286: LOG:  setting the local node "
> > 10.65.188.56:9999 Linux pg-mgrdb2" as watchdog cluster leader
> > 2024-12-02 14:40:16: pid 1286: LOG:
> signal_user1_to_parent_with_reason(1)
> > 2024-12-02 14:40:16: pid 1286: LOG:  I am the cluster leader node.
> Starting
> > escalation process
> > 2024-12-02 14:40:16: pid 1281: LOG:  Pgpool-II parent process received
> > SIGUSR1
> > 2024-12-02 14:40:16: pid 1281: LOG:  Pgpool-II parent process received
> > watchdog state change signal from watchdog
> > 2024-12-02 14:40:16: pid 1286: LOG:  escalation process started with
> > PID:4855
> > 2024-12-02 14:40:16: pid 4855: LOG:  watchdog: escalation started
> > 2024-12-02 14:40:20: pid 4855: LOG:  successfully acquired the delegate
> > IP:"10.65.188.59"
> > 2024-12-02 14:40:20: pid 4855: DETAIL:  'if_up_cmd' returned with success
> > 2024-12-02 14:40:20: pid 1286: LOG:  watchdog escalation process with
> pid:
> > 4855 exit with SUCCESS.
> >
> > It has siginficant delays at 14:40:12 and on acquiring the VIP at
> 14:40:16.
> > The quorum settings in gpgool.conf are:
> >
> > failover_when_quorum_exists=off
> > failover_require_consensus=on
> > allow_multiple_failover_requests_from_node=off
> >
> > So I nave no idea why it happens.
> >
> > 2. The second question is about a health check logics. I get right that
> if
> > a backend comes to down state, his health check gets stopped?
> > If yes, how can I ensure that a failed backend comes back (after hardware
> > issue for example), and should be recovered?
> > Or it's impossible within pgpool and I should use third-party gears for
> > tracking backends and triggering the recovering?
> >
> > BR
> > Igor Yurchenko
>
>
> --
> Bo Peng <pengbo at sraoss.co.jp>
> SRA OSS K.K.
> TEL: 03-5979-2701 FAX: 03-5979-2702
> URL: https://www.sraoss.co.jp/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20241211/42b3a45c/attachment.htm>