[pgpool-general: 9105] Re: Another segmentation fault

Fri May 24 18:48:54 JST 2024

Hi,

It turned out, I wasn't entirely accurate in my previous mail. Our tests
perform the downgrade of docker on the nodes one-by-one, not in parallel,
and give the cluster some time to recover in between. This means pgpool and
postgresql are stopped simultaneously on a single node, docker is
downgraded and the containers are restarted. The crash occurs on node 1
when the containers are stopped on node 2. Node 2 is the first node on
which the containers are stopped. At that moment, node 1 is the watchdog
leader and runs the primary database.

Most of our cluster tests start by resuming vms from a previously made
snapshot. This can major issues in both pgpool and postgresql, as the
machines experience gaps in time and might not recover in the correct
order, introducing unreliability in our tests. Therefore, we stop all
pgpool instances and the standby postgresql databases just prior to
creating the snapshot. After restoring the snapshots, we make sure the
database on node 1 is primary, start pgpool on node 1, 2 and 3 in that
order, and perform a pg_basebackup for the database on node 2 and 3 to make
sure they are in sync and following node 1. This accounts for the messages
about failovers and stops/starts you see in the log prior to the crash.
This process is completed at 01:00:52Z.

Best regards,
Emond

Op vr 24 mei 2024 om 09:37 schreef Emond Papegaaij <
emond.papegaaij at gmail.com>:

> Hi,
>
> Last night, another one of our test runs failed with a core dump of
> pgpool. At the moment of the crash, pgpool was part of a 3 node cluster. 3
> vms all running an instance of pgpool and and a postgresql database. The
> crash happened on node 1 while setting up all 3 vms for the test that would
> follow. This test is about upgrading docker, so during the preparation,
> docker has to be downgraded. This requires all docker containers being
> stopped, including the containers running pgpool and postgresql. Being a
> test, we do not care about availability, only about execution time, so this
> is done in parallel on all 3 vms. So on all 3 vms, pgpool and postgresql
> are stopped simultaneously. I can understand that is a situation is
> difficult to handle correctly in pgpool, but still it should not cause a
> segmentation fault. I've attached the pgpool logs for the node that crashed
> and the core dump. I do have logging from the other nodes as well, if
> required. The crash happens at 01:01:00Z.
>
> #0  connect_backend (sp=0x55803eb0a6b8, frontend=0x55803eb08768) at
> protocol/child.c:1076
> #1  0x000055803ce3d02a in get_backend_connection (frontend=0x55803eb08768)
> at protocol/child.c:2112
> #2  0x000055803ce38fd5 in do_child (fds=0x55803eabea90) at
> protocol/child.c:416
> #3  0x000055803cdfea4c in fork_a_child (fds=0x55803eabea90, id=13) at
> main/pgpool_main.c:863
> #4  0x000055803cdfde30 in PgpoolMain (discard_status=0 '\000',
> clear_memcache_oidmaps=0 '\000') at main/pgpool_main.c:561
> #5  0x000055803cdfb9e6 in main (argc=2, argv=0x7ffc8cdddda8) at
> main/main.c:365
>
> Best regards,
> Emond
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20240524/0abb64bf/attachment.htm>