[pgpool-general: 9029] Re: Failure to perform failover when primary lost network connectivity
Bo Peng
pengbo at sraoss.co.jp
Tue Feb 27 09:30:35 JST 2024
Hi,
Pgpool can handle failover even in case of network failure.
I think this issue may be caused by the setting of follow primary script.
Could you share the following information?
- pgpool.conf
- the scripts specified in failover_command and follow_primary_command
- pgpool logs
On Mon, 26 Feb 2024 10:57:56 +0800
Ang Sei Heng <seiheng at sendquick.com> wrote:
> Hello,
>
> We recently embark into Pgpool-II as failover and HA solution for our
> PostgreSQL database cluster. We use the example here:
>
> https://www.pgpool.net/docs/latest/en/html/example-configs.html
>
> The cluster is running smoothly and we are able to perform
> pcp_recovery_node, pcp_attach_node, pcp_detach_node and pcp_promote_node
> without issue.
>
> However, when we perform a test scenario where we need to simulate a
> lost network connectivity at the primary postgres database node, both
> standby systems went into shutdown instead of taking over the role of
> primary.
>
> From the logs, we found these:
>
> 2024-02-26 09:56:55.009 +08: pid 20700: LOG: database system was
> interrupted; last known up at 2024-02-26 09:56:52 +08
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> scp: /opt/postgres/pgpool/data/archivedir/00000039.history: No such file
> or directory
> 2024-02-26 09:56:55.445 +08: pid 20700: LOG: entering standby mode
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> 2024-02-26 09:56:55.573 +08: pid 20700: LOG: restored log file
> "00000038.history" from archive
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F2: No
> such file or directory
> 2024-02-26 09:56:55.759 +08: pid 20700: LOG: redo starts at 0/F2000028
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F3: No
> such file or directory
> 2024-02-26 09:56:55.912 +08: pid 20700: LOG: consistent recovery state
> reached at 0/F2000138
> 2024-02-26 09:56:55.912 +08: pid 20696: LOG: database system is ready
> to accept read-only connections
> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F3: No
> such file or directory
> 2024-02-26 09:56:56.165 +08: pid 20720: LOG: started streaming WAL
> from primary at 0/F3000000 on timeline 56
> 2024-02-26 09:58:48.097 +08: pid 20720: FATAL: terminating walreceiver
> due to timeout
> ssh: connect to host pgnode2 port 22: No route to host
> scp: Connection closed
> 2024-02-26 09:58:50.684 +08: pid 20700: FATAL: could not restore file
> "00000039.history" from archive: child process exited with exit code 255
> 2024-02-26 09:58:50.685 +08: pid 20696: LOG: startup process (PID
> 20700) exited with exit code 1
> 2024-02-26 09:58:50.685 +08: pid 20696: LOG: terminating any other
> active server processes
> 2024-02-26 09:58:50.685 +08: pid 20696: LOG: shutting down due to
> startup process failure
> 2024-02-26 09:58:50.687 +08: pid 20696: LOG: database system is shut down
>
> Here, the pgnode2 was the primary database node, when we disconnected
> its network connectivity, both standby nodes showed the logs above. It
> seems, when the system was trying to perform failover, it tried to copy
> the some files from primary node, pgnode2, which no longer accessible.
> This cause the failover to fail and resulting the system not able to start.
>
> We are not able to run pg_ctl start in both standby nodes.
>
> Our questions here:
>
> 1) Does this means, in network failure, Pgpool-II will not be able to
> handle it?
>
> 2) Is it possible to disable recovery process of copying files from dead
> primary node in the event of network failure (for example, a sudden
> power failure)?
>
>
> Any advice is welcome.
>
>
>
> Thanks,
> Sei Heng
>
>
> --
>
> SendQuick Pte Ltd
> 76 Playfair Road, #08-01,
> LHK 2 Building, Singapore 367996
> Tel: +65 6280 2881
> Fax: +65 6280 6882
>
> CONFIDENTIALITY NOTE:
> This email and any files transmitted with it is intended only for the use
> of the person(s) to whom it is addressed,and may contain information that
> is privileged, confidential and exempt from disclosure under applicable
> law. If you are not the intended recipient, please immediately notify the
> sender and delete the email. If you are not the intended recipient please
> do not disclose, copy, distribute or take any action in reliance on the
> contents of this e-mail. Thank you.
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
--
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS LLC
TEL: 03-5979-2701 FAX: 03-5979-2702
URL: https://www.sraoss.co.jp/
More information about the pgpool-general
mailing list