[pgpool-general: 7983] pcp_node_info does not return when host is lost on 4.3.0
Emond Papegaaij
emond.papegaaij at gmail.com
Thu Jan 20 17:33:55 JST 2022
Hi,
We are working on the upgrade from 4.2.6 to 4.3.0 and we are facing a test
that is failing consistently. In one of our tests we powerdown 2 of the 3
hosts with a hard poweroff. Prior to the poweroff, we configure the cluster
to the following layout:
node 1: standby pgpool and standby database
node 2: pgpool leader and primary database
node 3: standby pgpool and standby database
Node 2 and 3 are shutdown. We then expect pgpool on node 1 to lose its
quorum and not perform a failover. We then use pcp_watchdog_info
and pcp_node_info to read the status of the cluster on that node. The first
command returns fine, however, when we run 'pcp_node_info -U pcp -h
localhost -p 9898 -w -v 0' (to read the node info for node 1), it never
returns.
In the pgpool logs, I can see the call for pcp_watchdog_info at 03:23:43,
returning almost instantly. At 03:23:44, the pcp_node_info call is
received, it starts pid 203. This pid never completes and is still running
at the end of the log at 03:25:41. It repeats the following two lines
several times (172.29.30.2:5432 is node 2, the lost primary database):
pid 203: LOG: trying connecting to PostgreSQL server on "172.29.30.2:5432"
by INET socket
pid 203: DETAIL: timed out. retrying...
It seems pgpool is stuck in a loop. I've attached the log, starting at the
moment the 2 nodes are powered off (which happens at 03:23:40).
Best regards,
Emond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220120/c1a7dba4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-loop.log
Type: text/x-log
Size: 21968 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220120/c1a7dba4/attachment.bin>
More information about the pgpool-general
mailing list