<div dir="ltr">Hello everyone,<div>We are doing a POC on postgres HA setup with streaming replication (async) using pgpool-II as a load balancing & connection pooling and repmgr for setting up HA & automatic failover. </div><div><div>We are applying a test case, like isolating the VM1 node from the Network completely for more than 2 mins and again plug-in back the network, since we want to verify how the system works during network glitches, any chances of split-brain or so. </div></div><div>Our current setup looks like below,</div><div>2 VM's on Azure cloud, each VM has Postgres running along with Pgpool service.</div><div><img src="cid:ii_ly64hw8g1" alt="image.png" width="316" height="282" style="margin-right: 0px;"><br></div><div><br></div><div>We enabled watchdog and assigned a delegate IP</div><div><i>NOTE: as per some limitations we are using a floating IP and used for delegate IP.</i></div><div><br></div><div>During the test, here are our observations:</div><div>1. Client connections got hung from the time the VM1 got lost from the network and till VM1 gets back to normal. </div><div>2. Once the VM1 is lost then Pgpool promotes the VM2 as LEADER node and Postgres Standby got promoted to Primary on VM2 as well, but still client connections are not connecting to the new primary. Why is this not happening?</div><div>3. Once the VM1 is back to network, there is a split brain situation, where pgpool on VM1 takes the lead to become LEADER node (pgpool.log shows). and from then the client connects to the VM1 node via VIP. </div><div><br></div><div><u>pgpool.conf </u></div><div><p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">sr_check_period 10sec</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">health_check_period 30sec</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">health_check_timeout
20 sec</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">health_check_max_retries 3</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">health_check_retry_delay
1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">wd_lifecheck_method
= 'heartbeat'</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">wd_interval
= 10</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">wd_heartbeat_keepalive
= 2</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">wd_heartbeat_deadtime
= 30</font></p><p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"><br></font></p></div><div><u>Logs information: </u></div><div><p style="margin:0in;font-family:Calibri;font-size:12pt" lang="en-US"><span style="font-weight:bold">From VM2:</span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">Pgpool.log </font></span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">14:30:17 N/w disconnected</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">After 10 sec the streaming replication check failed and got timed out. </font></span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:26.176: sr_check_worker pid 58187: LOG:
failed to connect to PostgreSQL server on
"staging-ha0001:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">Then pgpool failed to do health check since it got
timed out as per health_check_timeout set to 20 sec</font></span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:35.869: health_check0 pid 58188: LOG:
failed to connect to PostgreSQL server on
"staging-ha0001:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">Re-trying health_check
& sr_check but again timed out.</font></span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:46.187: sr_check_worker pid 58187: LOG:
failed to connect to PostgreSQL server on
"staging-ha0001:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:46.880: health_check0 pid 58188: LOG:
failed to connect to PostgreSQL server on
"staging-ha0001:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">Watchdog received a message saying the Leader node is
lost.</font></span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:47.192: watchdog pid 58151: WARNING:
we have not received a beacon message from leader node
"staging-ha0001:9999 Linux staging-ha0001"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:47.192: watchdog pid 58151: DETAIL:
requesting info message from leader node</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:54.312: watchdog pid 58151: LOG:
read from socket failed, remote end closed the connection</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:54.312: watchdog pid 58151: LOG:
client socket of staging-ha0001:9999 Linux staging-ha0001 is closed</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:54.313: watchdog pid 58151: LOG:
remote node "staging-ha0001:9999 Linux staging-ha0001" is
reporting that it has lost us</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:54.313: watchdog pid 58151: LOG:
we are lost on the leader node "staging-ha0001:9999 Linux
staging-ha0001"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">Re-trying health_check
& sr_check but again timed out.</font></span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.888: health_check0 pid 58188: LOG:
failed to connect to PostgreSQL server on
"staging-ha0001:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.888: health_check0 pid 58188: LOG:
health check retrying on DB node: 0 (round:3)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:06.201: sr_check_worker pid 58187: LOG:
failed to connect to PostgreSQL server on
"staging-ha0001:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">After 10 sec from the time we lost the leader
node, watchdog changed current node to
LEADER node</font></span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:04.199: watchdog pid 58151: LOG:
watchdog node state changed from [STANDING FOR LEADER] to [LEADER]</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">health_check is failed on node 0 and it received a
degenerated request for node 0 and the
pgpool main process started quarantining staging-ha0001(5432) (shutting down)</font></span></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.202: watchdog pid 58151: LOG:
setting the local node "staging-ha0002:9999 Linux
staging-ha0002" as watchdog cluster leader</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.202: watchdog pid 58151: LOG:
signal_user1_to_parent_with_reason(1)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.202: watchdog pid 58151: LOG: I
am the cluster leader node but we do not have enough nodes in cluster</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.202: watchdog pid 58151: DETAIL:
waiting for the quorum to start escalation process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.202: main pid 58147: LOG:
Pgpool-II parent process received SIGUSR1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.202: main pid 58147: LOG:
Pgpool-II parent process received watchdog state change signal from
watchdog</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.899: health_check0 pid 58188: LOG:
failed to connect to PostgreSQL server on
"staging-ha0001:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.899: health_check0 pid 58188: LOG:
health check failed on node 0 (timeout:0)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.899: health_check0 pid 58188: LOG:
received degenerate backend request for node_id: 0 from pid [58188]</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.899: watchdog pid 58151: LOG:
watchdog received the failover command from local pgpool-II on IPC
interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.899: watchdog pid 58151: LOG:
watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST]
received from local pgpool-II on IPC interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.899: watchdog pid 58151: LOG:
failover requires the quorum to hold, which is not present at the moment</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.899: watchdog pid 58151: DETAIL:
Rejecting the failover request</font></p>
<p style="margin:0in;font-size:11pt;color:rgb(237,125,49)" lang="en-US"><font face="arial, sans-serif">2024-07-03 14:31:08.899: watchdog pid 58151: LOG: failover command [DEGENERATE_BACKEND_REQUEST]
request from pgpool-II node "staging-ha0002:9999 Linux
staging-ha0002" is rejected because the watchdog cluster does not hold the
quorum</font></p>
<p style="margin:0in;font-size:11pt;color:rgb(237,125,49)" lang="en-US"><font face="arial, sans-serif">2024-07-03 14:31:08.900: health_check0 pid 58188: LOG: degenerate backend request for 1 node(s) from
pid [58188], is changed to quarantine node request by watchdog</font></p>
<p style="margin:0in;font-size:11pt;color:rgb(237,125,49)" lang="en-US"><font face="arial, sans-serif">2024-07-03 14:31:08.900: health_check0 pid 58188: DETAIL: watchdog does not holds the quorum</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: health_check0 pid 58188: LOG:
signal_user1_to_parent_with_reason(0)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: main pid 58147: LOG:
Pgpool-II parent process received SIGUSR1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: main pid 58147: LOG:
Pgpool-II parent process has received failover request</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: watchdog pid 58151: LOG:
received the failover indication from Pgpool-II on IPC interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: watchdog pid 58151: LOG:
watchdog is informed of failover start by the main process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: main pid 58147: LOG: ===
Starting quarantine. shutdown host staging-ha0001(5432) ===</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: main pid 58147: LOG:
Restart all children</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: main pid 58147: LOG:
failover: set new primary node: -1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.900: main pid 58147: LOG:
failover: set new main node: 1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.906: sr_check_worker pid 58187: ERROR:
Failed to check replication time lag</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.906: sr_check_worker pid 58187: DETAIL: No persistent db connection for the node 0</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.906: sr_check_worker pid 58187: HINT:
check sr_check_user and sr_check_password</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.906: sr_check_worker pid 58187: CONTEXT: while checking replication time lag</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.906: sr_check_worker pid 58187: LOG:
worker process received restart request</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.906: watchdog pid 58151: LOG:
received the failover indication from Pgpool-II on IPC interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.906: watchdog pid 58151: LOG:
watchdog is informed of failover end by the main process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.906: main pid 58147: LOG: ===
Quarantine done. shutdown host staging-ha0001(5432) ===</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:09.906: pcp_main pid 58186: LOG:
restart request received in pcp child process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:09.907: main pid 58147: LOG: PCP
child 58186 exits with status 0 in failover()</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:09.908: main pid 58147: LOG: fork
a new PCP child pid 58578 in failover()</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:09.908: main pid 58147: LOG:
reaper handler</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:09.908: pcp_main pid 58578: LOG:
PCP process: 58578 started</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:09.909: main pid 58147: LOG:
reaper handler: exiting normally</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:09.909: sr_check_worker pid 58579: LOG:
process started</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.915: watchdog pid 58151: LOG:
not able to send messages to remote node "staging-ha0001:9999 Linux
staging-ha0001"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.915: watchdog pid 58151: DETAIL:
marking the node as lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.915: watchdog pid 58151: LOG:
remote node "staging-ha0001:9999 Linux staging-ha0001" is lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:12pt" lang="en-US"><span style="font-weight:bold"><font face="arial, sans-serif">From VM1:</font></span></p>
<p style="margin:0in;font-size:12pt" lang="en-US"><font face="arial, sans-serif"><b>pgpool.log</b></font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:36.444: watchdog pid 8620: LOG:
remote node "staging-ha0002:9999 Linux staging-ha0002" is not
replying to our beacons</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:36.444: watchdog pid 8620: DETAIL:
missed beacon reply count:2</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:37.448: sr_check_worker pid 65605: LOG:
failed to connect to PostgreSQL server on
"staging-ha0002:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:46.067: health_check1 pid 8676: LOG:
failed to connect to PostgreSQL server on
"staging-ha0002:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:46.068: health_check1 pid 8676: LOG:
health check retrying on DB node: 1 (round:1)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:46.455: watchdog pid 8620: LOG:
remote node "staging-ha0002:9999 Linux staging-ha0002" is not
replying to our beacons</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:46.455: watchdog pid 8620: DETAIL:
missed beacon reply count:3</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:47.449: sr_check_worker pid 65605: ERROR:
Failed to check replication time lag</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:47.449: sr_check_worker pid 65605: DETAIL: No persistent db connection for the node 1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:47.449: sr_check_worker pid 65605: HINT:
check sr_check_user and sr_check_password</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:47.449: sr_check_worker pid 65605: CONTEXT: while checking replication time lag</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:55.104: child pid 65509: LOG:
failover or failback event detected</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:55.104: child pid 65509: DETAIL:
restarting myself</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:55.104: main pid 8617: LOG: reaper
handler</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:55.105: main pid 8617: LOG: reaper
handler: exiting normally</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.459: watchdog pid 8620: LOG:
remote node "staging-ha0002:9999 Linux staging-ha0002" is not
replying to our beacons</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.459: watchdog pid 8620: DETAIL:
missed beacon reply count:4</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.459: watchdog pid 8620: LOG:
remote node "staging-ha0002:9999 Linux staging-ha0002" is not
responding to our beacon messages</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.459: watchdog pid 8620: DETAIL:
marking the node as lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.459: watchdog pid 8620: LOG:
remote node "staging-ha0002:9999 Linux staging-ha0002" is lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.460: watchdog pid 8620: LOG:
removing watchdog node "staging-ha0002:9999 Linux
staging-ha0002" from the standby list</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.460: watchdog pid 8620: LOG: We
have lost the quorum</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.460: watchdog pid 8620: LOG:
signal_user1_to_parent_with_reason(3)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.460: main pid 8617: LOG:
Pgpool-II parent process received SIGUSR1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.460: main pid 8617: LOG:
Pgpool-II parent process received watchdog quorum change signal from
watchdog</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:56.461: watchdog_utility pid 66197: LOG:
watchdog: de-escalation started</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">sudo: a
terminal is required to read the password; either use the -S option to read
from standard input or configure an askpass helper</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.078: health_check1 pid 8676: LOG:
failed to connect to PostgreSQL server on
"staging-ha0002:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.078: health_check1 pid 8676: LOG:
health check retrying on DB node: 1 (round:2)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.418: life_check pid 8639: LOG:
informing the node status change to watchdog</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.418: life_check pid 8639: DETAIL:
node id :1 status = "NODE DEAD" message:"No heartbeat
signal from node"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.418: watchdog pid 8620: LOG:
received node status change ipc message</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.418: watchdog pid 8620: DETAIL:
No heartbeat signal from node</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.418: watchdog pid 8620: LOG:
remote node "staging-ha0002:9999 Linux staging-ha0002" is lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:57.464: sr_check_worker pid 65605: LOG:
failed to connect to PostgreSQL server on
"staging-ha0002:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">sudo: a
password is required</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:59.301: watchdog_utility pid 66197: LOG:
failed to release the delegate IP:"10.127.1.20"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:59.301: watchdog_utility pid 66197: DETAIL: 'if_down_cmd' failed</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:59.301: watchdog_utility pid 66197: WARNING: watchdog de-escalation failed to bring down
delegate IP</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:30:59.301: watchdog pid 8620: LOG:
watchdog de-escalation process with pid: 66197 exit with SUCCESS.</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:07.465: sr_check_worker pid 65605: ERROR:
Failed to check replication time lag</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:07.465: sr_check_worker pid 65605: DETAIL: No persistent db connection for the node 1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:07.465: sr_check_worker pid 65605: HINT:
check sr_check_user and sr_check_password</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:07.465: sr_check_worker pid 65605: CONTEXT: while checking replication time lag</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.089: health_check1 pid 8676: LOG:
failed to connect to PostgreSQL server on
"staging-ha0002:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:08.089: health_check1 pid 8676: LOG:
health check retrying on DB node: 1 (round:3)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:17.480: sr_check_worker pid 65605: LOG:
failed to connect to PostgreSQL server on
"staging-ha0002:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: health_check1 pid 8676: LOG:
failed to connect to PostgreSQL server on
"staging-ha0002:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: health_check1 pid 8676: LOG:
health check failed on node 1 (timeout:0)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: health_check1 pid 8676: LOG:
received degenerate backend request for node_id: 1 from pid [8676]</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: watchdog pid 8620: LOG:
watchdog received the failover command from local pgpool-II on IPC
interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: watchdog pid 8620: LOG:
watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST]
received from local pgpool-II on IPC interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: watchdog pid 8620: LOG:
failover requires the quorum to hold, which is not present at the moment</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: watchdog pid 8620: DETAIL:
Rejecting the failover request</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: watchdog pid 8620: LOG:
failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II
node "staging-ha0001:9999 Linux staging-ha0001" is rejected because
the watchdog cluster does not hold the quorum</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: health_check1 pid 8676: LOG:
degenerate backend request for 1 node(s) from pid [8676], is changed to
quarantine node request by watchdog</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: health_check1 pid 8676: DETAIL:
watchdog does not holds the quorum</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: health_check1 pid 8676: LOG:
signal_user1_to_parent_with_reason(0)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: main pid 8617: LOG:
Pgpool-II parent process received SIGUSR1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.097: main pid 8617: LOG:
Pgpool-II parent process has received failover request</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: watchdog pid 8620: LOG:
received the failover indication from Pgpool-II on IPC interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: watchdog pid 8620: LOG:
watchdog is informed of failover start by the main process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: main pid 8617: LOG: ===
Starting quarantine. shutdown host staging-ha0002(5432) ===</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: main pid 8617: LOG: Do not
restart children because we are switching over node id 1 host: staging-ha0002
port: 5432 and we are in streaming replication mode</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: main pid 8617: LOG:
failover: set new primary node: 0</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: main pid 8617: LOG:
failover: set new main node: 0</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: sr_check_worker pid 65605: ERROR:
Failed to check replication time lag</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: sr_check_worker pid 65605: DETAIL: No persistent db connection for the node 1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: sr_check_worker pid 65605: HINT:
check sr_check_user and sr_check_password</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: sr_check_worker pid 65605: CONTEXT: while checking replication time lag</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: sr_check_worker pid 65605: LOG:
worker process received restart request</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: watchdog pid 8620: LOG:
received the failover indication from Pgpool-II on IPC interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: watchdog pid 8620: LOG:
watchdog is informed of failover end by the main process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:31:19.098: main pid 8617: LOG: ===
Quarantine done. shutdown host staging-ha0002(5432) ==</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif"> </font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.420: watchdog pid 8620: LOG:
new outbound connection to staging-ha0002:9000</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: LOG:
"staging-ha0001:9999 Linux staging-ha0001" is the coordinator
as per our record but "staging-ha0002:9999 Linux staging-ha0002" is
also announcing as a coordinator</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: DETAIL:
cluster is in the split-brain</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: LOG: I
am the coordinator but "staging-ha0002:9999 Linux staging-ha0002" is
also announcing as a coordinator</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: DETAIL:
trying to figure out the best contender for the leader/coordinator node</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: LOG:
remote node:"staging-ha0002:9999 Linux staging-ha0002" should
step down from leader because we are the older leader</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: LOG: We
are in split brain, and I am the best candidate for leader/coordinator</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: DETAIL:
asking the remote node "staging-ha0002:9999 Linux
staging-ha0002" to step down</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: LOG: we
have received the NODE INFO message from the node:"staging-ha0002:9999
Linux staging-ha0002" that was lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: DETAIL:
we had lost this node because of "REPORTED BY LIFECHECK"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: LOG:
node:"staging-ha0002:9999 Linux staging-ha0002" was reported
lost by the life-check process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: DETAIL:
node will be added to cluster once life-check mark it as reachable again</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: LOG:
"staging-ha0001:9999 Linux staging-ha0001" is the coordinator
as per our record but "staging-ha0002:9999 Linux staging-ha0002" is
also announcing as a coordinator</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.423: watchdog pid 8620: DETAIL:
cluster is in the split-brain</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: LOG: I
am the coordinator but "staging-ha0002:9999 Linux staging-ha0002" is
also announcing as a coordinator</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: DETAIL:
trying to figure out the best contender for the leader/coordinator node</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: LOG:
remote node:"staging-ha0002:9999 Linux staging-ha0002" should
step down from leader because we are the older leader</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: LOG: We
are in split brain, and I am the best candidate for leader/coordinator</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: DETAIL:
asking the remote node "staging-ha0002:9999 Linux
staging-ha0002" to step down</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: LOG: we
have received the NODE INFO message from the node:"staging-ha0002:9999
Linux staging-ha0002" that was lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: DETAIL:
we had lost this node because of "REPORTED BY LIFECHECK"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: LOG:
node:"staging-ha0002:9999 Linux staging-ha0002" was reported
lost by the life-check process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: DETAIL:
node will be added to cluster once life-check mark it as reachable again</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.424: watchdog pid 8620: LOG:
remote node "staging-ha0002:9999 Linux staging-ha0002" is
reporting that it has found us again</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: LOG:
leader/coordinator node "staging-ha0002:9999 Linux
staging-ha0002" decided to resign from leader, probably because of
split-brain</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: DETAIL:
It was not our coordinator/leader anyway. ignoring the message</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: LOG: we
have received the NODE INFO message from the node:"staging-ha0002:9999
Linux staging-ha0002" that was lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: DETAIL:
we had lost this node because of "REPORTED BY LIFECHECK"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: LOG:
node:"staging-ha0002:9999 Linux staging-ha0002" was reported
lost by the life-check process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: DETAIL:
node will be added to cluster once life-check mark it as reachable again</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: LOG: we
have received the NODE INFO message from the node:"staging-ha0002:9999
Linux staging-ha0002" that was lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: DETAIL:
we had lost this node because of "REPORTED BY LIFECHECK"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: LOG:
node:"staging-ha0002:9999 Linux staging-ha0002" was reported
lost by the life-check process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.425: watchdog pid 8620: DETAIL:
node will be added to cluster once life-check mark it as reachable again</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.427: watchdog pid 8620: LOG: we
have received the NODE INFO message from the node:"staging-ha0002:9999
Linux staging-ha0002" that was lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.427: watchdog pid 8620: DETAIL:
we had lost this node because of "REPORTED BY LIFECHECK"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.427: watchdog pid 8620: LOG:
node:"staging-ha0002:9999 Linux staging-ha0002" was reported
lost by the life-check process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.427: watchdog pid 8620: DETAIL:
node will be added to cluster once life-check mark it as reachable again</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.427: watchdog pid 8620: LOG: we
have received the NODE INFO message from the node:"staging-ha0002:9999
Linux staging-ha0002" that was lost</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.427: watchdog pid 8620: DETAIL:
we had lost this node because of "REPORTED BY LIFECHECK"</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.427: watchdog pid 8620: LOG:
node:"staging-ha0002:9999 Linux staging-ha0002" was reported
lost by the life-check process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:35:59.427: watchdog pid 8620: DETAIL:
node will be added to cluster once life-check mark it as reachable again</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:00.213: health_check1 pid 8676: LOG:
failed to connect to PostgreSQL server on
"staging-ha0002:5432", timed out</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:00.213: health_check1 pid 8676: LOG:
health check retrying on DB node: 1 (round:3)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: health_check1 pid 8676: LOG:
health check retrying on DB node: 1 succeeded</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: health_check1 pid 8676: LOG:
received failback request for node_id: 1 from pid [8676]</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: health_check1 pid 8676: LOG:
failback request from pid [8676] is changed to update status request
because node_id: 1 was quarantined</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: health_check1 pid 8676: LOG:
signal_user1_to_parent_with_reason(0)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: main pid 8617: LOG:
Pgpool-II parent process received SIGUSR1</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: main pid 8617: LOG:
Pgpool-II parent process has received failover request</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: watchdog pid 8620: LOG:
received the failover indication from Pgpool-II on IPC interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: watchdog pid 8620: LOG:
watchdog is informed of failover start by the main process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.221: watchdog pid 8620: LOG:
watchdog is informed of failover start by the main process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: main pid 8617: LOG: ===
Starting fail back. reconnect host staging-ha0002(5432) ===</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: main pid 8617: LOG: Node 0
is not down (status: 2)</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: main pid 8617: LOG: Do not
restart children because we are failing back node id 1 host: staging-ha0002
port: 5432 and we are in streaming replication mode and not all backends were
down</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: main pid 8617: LOG:
failover: set new primary node: 0</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: main pid 8617: LOG:
failover: set new main node: 0</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: sr_check_worker pid 66222: LOG:
worker process received restart request</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: watchdog pid 8620: LOG:
received the failover indication from Pgpool-II on IPC interface</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: watchdog pid 8620: LOG:
watchdog is informed of failover end by the main process</font></p>
<p style="margin:0in;font-size:11pt" lang="en-US"><font face="arial, sans-serif">2024-07-03
14:36:01.222: main pid 8617: LOG: ===
Failback done. reconnect host staging-ha0002(5432) ===</font></p></div><div><br></div><div><br></div><div><font size="4"><b>Questions: </b></font></div><div><font size="4">1. From the point 2 in observations, why are the connections not going to new primary? </font></div><div><font size="4">2. In this kind of setup will the transaction split happen when there is a network glitch? </font></div><div><br></div><div>If anyone has worked on similar kind of setup, please provide your insights about it.</div><div>Thank you</div><div><br></div><div>Regards</div><div>Mukesh</div><div><br></div><div><br><div> </div></div></div>