[pgpool-general: 7983] pcp_node_info does not return when host is lost on 4.3.0

Thu Jan 20 17:33:55 JST 2022

Hi,

We are working on the upgrade from 4.2.6 to 4.3.0 and we are facing a test
that is failing consistently. In one of our tests we powerdown 2 of the 3
hosts with a hard poweroff. Prior to the poweroff, we configure the cluster
to the following layout:
node 1: standby pgpool and standby database
node 2: pgpool leader and primary database
node 3: standby pgpool and standby database

Node 2 and 3 are shutdown. We then expect pgpool on node 1 to lose its
quorum and not perform a failover. We then use pcp_watchdog_info
and pcp_node_info to read the status of the cluster on that node. The first
command returns fine, however, when we run 'pcp_node_info -U pcp -h
localhost -p 9898 -w -v 0' (to read the node info for node 1), it never
returns.

In the pgpool logs, I can see the call for pcp_watchdog_info at 03:23:43,
returning almost instantly. At 03:23:44, the pcp_node_info call is
received, it starts pid 203. This pid never completes and is still running
at the end of the log at 03:25:41. It repeats the following two lines
several times (172.29.30.2:5432 is node 2, the lost primary database):
pid 203: LOG:  trying connecting to PostgreSQL server on "172.29.30.2:5432"
by INET socket
pid 203: DETAIL:  timed out. retrying...

It seems pgpool is stuck in a loop. I've attached the log, starting at the
moment the 2 nodes are powered off (which happens at 03:23:40).

Best regards,
Emond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220120/c1a7dba4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool-loop.log
Type: text/x-log
Size: 21968 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20220120/c1a7dba4/attachment.bin>