[pgpool-general: 9045] Re: Failure to perform failover when primary lost network connectivity
Ang Sei Heng
seiheng at sendquick.com
Fri Mar 8 20:06:42 JST 2024
Hello,
Sorry, for late reply. We managed to setup another cluster and it works
fine, so there is some issue with the previous cluster.
We did find some inconsistency configurations in the earlier cluster and
will investigate further on that. However, this lead to some comments
where we find, in order for pgpool + postgresql to work correctly, the
configuration in each nodes have to be exact mirror of each other. Is
this assessment correct?
After our successful failover test, we also notice, the VIP and DB
primary nodes does not follow each other. Is there a configuration we
can set so VIP will stick to DB primary node?
Any advice is welcome.
Thanks,
Sei Heng
On 2/27/24 8:30 AM, Bo Peng wrote:
> Hi,
>
> Pgpool can handle failover even in case of network failure.
>
> I think this issue may be caused by the setting of follow primary script.
>
> Could you share the following information?
>
> - pgpool.conf
> - the scripts specified in failover_command and follow_primary_command
> - pgpool logs
>
> On Mon, 26 Feb 2024 10:57:56 +0800
> Ang Sei Heng <seiheng at sendquick.com> wrote:
>
>> Hello,
>>
>> We recently embark into Pgpool-II as failover and HA solution for our
>> PostgreSQL database cluster. We use the example here:
>>
>> https://www.pgpool.net/docs/latest/en/html/example-configs.html
>>
>> The cluster is running smoothly and we are able to perform
>> pcp_recovery_node, pcp_attach_node, pcp_detach_node and pcp_promote_node
>> without issue.
>>
>> However, when we perform a test scenario where we need to simulate a
>> lost network connectivity at the primary postgres database node, both
>> standby systems went into shutdown instead of taking over the role of
>> primary.
>>
>> From the logs, we found these:
>>
>> 2024-02-26 09:56:55.009 +08: pid 20700: LOG: database system was
>> interrupted; last known up at 2024-02-26 09:56:52 +08
>> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
>> scp: /opt/postgres/pgpool/data/archivedir/00000039.history: No such file
>> or directory
>> 2024-02-26 09:56:55.445 +08: pid 20700: LOG: entering standby mode
>> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
>> 2024-02-26 09:56:55.573 +08: pid 20700: LOG: restored log file
>> "00000038.history" from archive
>> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
>> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F2: No
>> such file or directory
>> 2024-02-26 09:56:55.759 +08: pid 20700: LOG: redo starts at 0/F2000028
>> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
>> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F3: No
>> such file or directory
>> 2024-02-26 09:56:55.912 +08: pid 20700: LOG: consistent recovery state
>> reached at 0/F2000138
>> 2024-02-26 09:56:55.912 +08: pid 20696: LOG: database system is ready
>> to accept read-only connections
>> Warning: Permanently added 'pgnode2' (ED25519) to the list of known hosts.
>> scp: /opt/postgres/pgpool/data/archivedir/0000003800000000000000F3: No
>> such file or directory
>> 2024-02-26 09:56:56.165 +08: pid 20720: LOG: started streaming WAL
>> from primary at 0/F3000000 on timeline 56
>> 2024-02-26 09:58:48.097 +08: pid 20720: FATAL: terminating walreceiver
>> due to timeout
>> ssh: connect to host pgnode2 port 22: No route to host
>> scp: Connection closed
>> 2024-02-26 09:58:50.684 +08: pid 20700: FATAL: could not restore file
>> "00000039.history" from archive: child process exited with exit code 255
>> 2024-02-26 09:58:50.685 +08: pid 20696: LOG: startup process (PID
>> 20700) exited with exit code 1
>> 2024-02-26 09:58:50.685 +08: pid 20696: LOG: terminating any other
>> active server processes
>> 2024-02-26 09:58:50.685 +08: pid 20696: LOG: shutting down due to
>> startup process failure
>> 2024-02-26 09:58:50.687 +08: pid 20696: LOG: database system is shut down
>>
>> Here, the pgnode2 was the primary database node, when we disconnected
>> its network connectivity, both standby nodes showed the logs above. It
>> seems, when the system was trying to perform failover, it tried to copy
>> the some files from primary node, pgnode2, which no longer accessible.
>> This cause the failover to fail and resulting the system not able to start.
>>
>> We are not able to run pg_ctl start in both standby nodes.
>>
>> Our questions here:
>>
>> 1) Does this means, in network failure, Pgpool-II will not be able to
>> handle it?
>>
>> 2) Is it possible to disable recovery process of copying files from dead
>> primary node in the event of network failure (for example, a sudden
>> power failure)?
>>
>>
>> Any advice is welcome.
>>
>>
>>
>> Thanks,
>> Sei Heng
>>
>>
>> --
>>
>> SendQuick Pte Ltd
>> 76 Playfair Road, #08-01,
>> LHK 2 Building, Singapore 367996
>> Tel: +65 6280 2881
>> Fax: +65 6280 6882
>>
>> CONFIDENTIALITY NOTE:
>> This email and any files transmitted with it is intended only for the use
>> of the person(s) to whom it is addressed,and may contain information that
>> is privileged, confidential and exempt from disclosure under applicable
>> law. If you are not the intended recipient, please immediately notify the
>> sender and delete the email. If you are not the intended recipient please
>> do not disclose, copy, distribute or take any action in reliance on the
>> contents of this e-mail. Thank you.
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
--
SendQuick Pte Ltd
76 Playfair Road, #08-01,
LHK 2 Building, Singapore 367996
Tel: +65 6280 2881
Fax: +65 6280 6882
CONFIDENTIALITY NOTE:
This email and any files transmitted with it is intended only for the use
of the person(s) to whom it is addressed,and may contain information that
is privileged, confidential and exempt from disclosure under applicable
law. If you are not the intended recipient, please immediately notify the
sender and delete the email. If you are not the intended recipient please
do not disclose, copy, distribute or take any action in reliance on the
contents of this e-mail. Thank you.
More information about the pgpool-general
mailing list