[pgpool-hackers: 3296] Re: duplicate failover request over allow_multiple_failover_requests_from_node=off
Tatsuo Ishii
ishii at sraoss.co.jp
Thu Apr 11 08:12:53 JST 2019
I think this has been discussed before:
http://www.sraoss.jp/pipermail/pgpool-hackers/2018-March/002756.html
(the original discussion was in Japanese local list:
[pgpool-general-jp: 1504]
https://www.pgpool.net/pipermail/pgpool-general-jp/2018-March/001503.html
and I believe Usama has been working on it.
http://www.sraoss.jp/pipermail/pgpool-hackers/2018-March/002757.html
Usama, any progress on this?
BTW,
> When the communication between master/coordinator pgpool and
> primary PostgreSQL node is down during a short period
I wonder why you don't set appropriate health check retry parameters
to avoid such a temporary communication failure in the firs place. A
brain surgery to ignore the error reports from Pgpool-II does not seem
to be a sane choice.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
> Hello, Pgpool developers
>
>
> I found Pgpool-II watchdog is too strict for duplicate failover request
> with allow_multiple_failover_requests_from_node=off setting.
>
> For example, A watchdog cluster with 3 pgpool instances is here.
> Their backends are PostgreSQL servers using streaming replication.
>
> When the communication between master/coordinator pgpool and
> primary PostgreSQL node is down during a short period
> (or pgpool do any false-positive judgement by various reasons),
> and then the pgpool tries to failover but cannot get the consensus,
> so it makes the primary node into quarantine status. It cannot
> be reset automatically. As a result, the service becomes unavailable.
>
> This case generates logs like the following:
>
> pid 1234: LOG: new IPC connection received
> pid 1234: LOG: watchdog received the failover command from local pgpool-II on IPC interface
> pid 1234: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
> pid 1234: LOG: Duplicate failover request from "pg1:5432 Linux pg1" node
> pid 1234: DETAIL: request ignored
> pid 1234: LOG: failover requires the majority vote, waiting for consensus
> pid 1234: DETAIL: failover request noted
> pid 4321: LOG: degenerate backend request for 1 node(s) from pid [4321], is changed to quarantine node request by watchdog
> pid 4321: DETAIL: watchdog is taking time to build consensus
>
> Note that this case dosen't have any communication truouble among
> the Pgpool watchdog nodes.
> You can reproduce it by changing one PostgreSQL's pg_hba.conf to
> reject the helth check access from one pgpool node in short period.
>
> The document don't say that duplicate failover requests make the node
> quarantine immediately. I think it should be just igunoring the request.
>
> A patch file for head of V3_7_STABLE is attached.
> Pgpool with this patch also disturbs failover by single pgpool's repeated
> failover requests. But it can recover when the connection trouble is gone.
>
> Does this change have any problem?
>
>
> with best regards,
> TAKATSUKA Haruka <harukat at sraoss.co.jp>
More information about the pgpool-hackers
mailing list