[pgpool-general: 9026] Failure to perform failover when primary lost network connectivity
Ang Sei Heng
seiheng at sendquick.com
Mon Feb 26 11:57:56 JST 2024
Hello,
We recently embark into Pgpool-II as failover and HA solution for our
PostgreSQL database cluster. We use the example here:
https://www.pgpool.net/docs/latest/en/html/example-configs.html
The cluster is running smoothly and we are able to perform
pcp_recovery_node, pcp_attach_node, pcp_detach_node and pcp_promote_node
without issue.
However, when we perform a test scenario where we need to simulate a
lost network connectivity at the primary postgres database node, both
standby systems went into shutdown instead of taking over the role of
primary.
From the logs, we found these:
2024-02-26 09:56:55.009 +08: pid 20700: LOG: database system was
interrupted; last known up at 2024-02-26 09:56:52 +08
Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
scp: /opt/postgres/pgpool/data/archivedir/00000039.history: No such file
or directory
2024-02-26 09:56:55.445 +08: pid 20700: LOG: entering standby mode
Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
2024-02-26 09:56:55.573 +08: pid 20700: LOG: restored log file
"00000038.history" from archive
Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F2: No
such file or directory
2024-02-26 09:56:55.759 +08: pid 20700: LOG: redo starts at 0/F2000028
Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F3: No
such file or directory
2024-02-26 09:56:55.912 +08: pid 20700: LOG: consistent recovery state
reached at 0/F2000138
2024-02-26 09:56:55.912 +08: pid 20696: LOG: database system is ready
to accept read-only connections
Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F3: No
such file or directory
2024-02-26 09:56:56.165 +08: pid 20720: LOG: started streaming WAL
from primary at 0/F3000000 on timeline 56
2024-02-26 09:58:48.097 +08: pid 20720: FATAL: terminating walreceiver
due to timeout
ssh: connect to host pgnode2 port 22: No route to host
scp: Connection closed
2024-02-26 09:58:50.684 +08: pid 20700: FATAL: could not restore file
"00000039.history" from archive: child process exited with exit code 255
2024-02-26 09:58:50.685 +08: pid 20696: LOG: startup process (PID
20700) exited with exit code 1
2024-02-26 09:58:50.685 +08: pid 20696: LOG: terminating any other
active server processes
2024-02-26 09:58:50.685 +08: pid 20696: LOG: shutting down due to
startup process failure
2024-02-26 09:58:50.687 +08: pid 20696: LOG: database system is shut down
Here, the pgnode2 was the primary database node, when we disconnected
its network connectivity, both standby nodes showed the logs above. It
seems, when the system was trying to perform failover, it tried to copy
the some files from primary node, pgnode2, which no longer accessible.
This cause the failover to fail and resulting the system not able to start.
We are not able to run pg_ctl start in both standby nodes.
Our questions here:
1) Does this means, in network failure, Pgpool-II will not be able to
handle it?
2) Is it possible to disable recovery process of copying files from dead
primary node in the event of network failure (for example, a sudden
power failure)?
Any advice is welcome.
Thanks,
Sei Heng
--
SendQuick Pte Ltd
76 Playfair Road, #08-01,
LHK 2 Building, Singapore 367996
Tel: +65 6280 2881
Fax: +65 6280 6882
CONFIDENTIALITY NOTE:
This email and any files transmitted with it is intended only for the use
of the person(s) to whom it is addressed,and may contain information that
is privileged, confidential and exempt from disclosure under applicable
law. If you are not the intended recipient, please immediately notify the
sender and delete the email. If you are not the intended recipient please
do not disclose, copy, distribute or take any action in reliance on the
contents of this e-mail. Thank you.
_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net
http://www.pgpool.net/mailman/listinfo/pgpool-general
More information about the pgpool-general
mailing list