<div dir="ltr"><div>HI,</div><div>Thanks for the update. i've checked the mentioned wd_* params. It do works, thanks. <br></div><div>About heartbeat to dead/down backends. May be it worth to consider adding an option that enable probing for dead/down backends and if they get alive, trigger a script for recovering. <br></div><div>Anyway, thanks for your assistance. <br clear="all"></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>BR</div><div>Igor Yurchenko<br></div></div></div></div><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, 5 Dec 2024 at 09:19, Bo Peng <<a href="mailto:pengbo@sraoss.co.jp">pengbo@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

> 1. I have 2 pgpool instances that watch each other and handling pgpool VIP.<br>

> I see that when a current pgpool leader comes down, the role switched and<br>

> VIP moved with significant delay. In logs I see a this picture:<br>

><br>

> It has siginficant delays at 14:40:12 and on acquiring the VIP at 14:40:16.<br>

> The quorum settings in gpgool.conf are:<br>

> <br>

> failover_when_quorum_exists=off<br>

> failover_require_consensus=on<br>

> allow_multiple_failover_requests_from_node=off<br>

<br>

These parameters are configured for PostgreSQL failover behavior, not for Pgpool-II leader node switchover.<br>

<br>

If you want to reduce the time required for a Pgpool-II leader node switchover,<br>

you can decrease the values of the parameters below:<br>

<br>

 wd_interval = 10<br>

 wd_heartbeat_deadtime = 30<br>

<br>

> 2. The second question is about a health check logics. I get right that if<br>

> a backend comes to down state, his health check gets stopped?<br>

> If yes, how can I ensure that a failed backend comes back (after hardware<br>

> issue for example), and should be recovered?<br>

> Or it's impossible within pgpool and I should use third-party gears for<br>

> tracking backends and triggering the recovering?<br>

<br>

Only a failed standby node can be reattached to pgpool automatically by setting "auto_failback = on" when it recovers.<br>

A failed primary node cannot be reattached to pgpool automatically. You need to recover it manually.<br>

<br>

<br>

On Mon, 2 Dec 2024 19:26:23 +0200<br>

Igor Yurchenko <<a href="mailto:harry.urcen@gmail.com" target="_blank">harry.urcen@gmail.com</a>> wrote:<br>

<br>

> Hi guys<br>

> <br>

> Need you hints on some weird behaviors of PGPool 4.2.<br>

> <br>

> 1. I have 2 pgpool instances that watch each other and handling pgpool VIP.<br>

> I see that when a current pgpool leader comes down, the role switched and<br>

> VIP moved with significant delay. In logs I see a this picture:<br>

> <br>

> 2024-12-02 14:40:12: pid 1286: LOG:  watchdog node state changed from<br>

> [INITIALIZING] to [LEADER]<br>

> 2024-12-02 14:40:12: pid 1286: LOG:  Setting failover command timeout to 1<br>

> 2024-12-02 14:40:12: pid 1286: LOG:  I am announcing my self as<br>

> leader/coordinator watchdog node<br>

> 2024-12-02 14:40:16: pid 1286: LOG:  I am the cluster leader node<br>

> 2024-12-02 14:40:16: pid 1286: DETAIL:  our declare coordinator message is<br>

> accepted by all nodes<br>

> 2024-12-02 14:40:16: pid 1286: LOG:  setting the local node "<br>

> <a href="http://10.65.188.56:9999" rel="noreferrer" target="_blank">10.65.188.56:9999</a> Linux pg-mgrdb2" as watchdog cluster leader<br>

> 2024-12-02 14:40:16: pid 1286: LOG:  signal_user1_to_parent_with_reason(1)<br>

> 2024-12-02 14:40:16: pid 1286: LOG:  I am the cluster leader node. Starting<br>

> escalation process<br>

> 2024-12-02 14:40:16: pid 1281: LOG:  Pgpool-II parent process received<br>

> SIGUSR1<br>

> 2024-12-02 14:40:16: pid 1281: LOG:  Pgpool-II parent process received<br>

> watchdog state change signal from watchdog<br>

> 2024-12-02 14:40:16: pid 1286: LOG:  escalation process started with<br>

> PID:4855<br>

> 2024-12-02 14:40:16: pid 4855: LOG:  watchdog: escalation started<br>

> 2024-12-02 14:40:20: pid 4855: LOG:  successfully acquired the delegate<br>

> IP:"10.65.188.59"<br>

> 2024-12-02 14:40:20: pid 4855: DETAIL:  'if_up_cmd' returned with success<br>

> 2024-12-02 14:40:20: pid 1286: LOG:  watchdog escalation process with pid:<br>

> 4855 exit with SUCCESS.<br>

> <br>

> It has siginficant delays at 14:40:12 and on acquiring the VIP at 14:40:16.<br>

> The quorum settings in gpgool.conf are:<br>

> <br>

> failover_when_quorum_exists=off<br>

> failover_require_consensus=on<br>

> allow_multiple_failover_requests_from_node=off<br>

> <br>

> So I nave no idea why it happens.<br>

> <br>

> 2. The second question is about a health check logics. I get right that if<br>

> a backend comes to down state, his health check gets stopped?<br>

> If yes, how can I ensure that a failed backend comes back (after hardware<br>

> issue for example), and should be recovered?<br>

> Or it's impossible within pgpool and I should use third-party gears for<br>

> tracking backends and triggering the recovering?<br>

> <br>

> BR<br>

> Igor Yurchenko<br>

<br>

<br>

-- <br>

Bo Peng <<a href="mailto:pengbo@sraoss.co.jp" target="_blank">pengbo@sraoss.co.jp</a>><br>

SRA OSS K.K.<br>

TEL: 03-5979-2701 FAX: 03-5979-2702<br>

URL: <a href="https://www.sraoss.co.jp/" rel="noreferrer" target="_blank">https://www.sraoss.co.jp/</a><br>

</blockquote></div>