<div dir="ltr"><div>Hi,</div><div><br></div><div>It turned out, I wasn't entirely accurate in my previous mail. Our tests perform the downgrade of docker on the nodes one-by-one, not in parallel, and give the cluster some time to recover in between. This means pgpool and postgresql are stopped simultaneously on a single node, docker is downgraded and the containers are restarted. The crash occurs on node 1 when the containers are stopped on node 2. Node 2 is the first node on which the containers are stopped. At that moment, node 1 is the watchdog leader and runs the primary database.</div><div><br></div><div>Most of our cluster tests start by resuming vms from a previously made snapshot. This can major issues in both pgpool and postgresql, as the machines experience gaps in time and might not recover in the correct order, introducing unreliability in our tests. Therefore, we stop all pgpool instances and the standby postgresql databases just prior to creating the snapshot. After restoring the snapshots, we make sure the database on node 1 is primary, start pgpool on node 1, 2 and 3 in that order, and perform a pg_basebackup for the database on node 2 and 3 to make sure they are in sync and following node 1. This accounts for the messages about failovers and stops/starts you see in the log prior to the crash. This process is completed at 01:00:52Z.</div><div><br></div><div>Best regards,</div><div>Emond</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Op vr 24 mei 2024 om 09:37 schreef Emond Papegaaij <<a href="mailto:emond.papegaaij@gmail.com">emond.papegaaij@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>Last night, another one of our test runs failed with a core dump of pgpool. At the moment of the crash, pgpool was part of a 3 node cluster. 3 vms all running an instance of pgpool and and a postgresql database. The crash happened on node 1 while setting up all 3 vms for the test that would follow. This test is about upgrading docker, so during the preparation, docker has to be downgraded. This requires all docker containers being stopped, including the containers running pgpool and postgresql. Being a test, we do not care about availability, only about execution time, so this is done in parallel on all 3 vms. So on all 3 vms, pgpool and postgresql are stopped simultaneously. I can understand that is a situation is difficult to handle correctly in pgpool, but still it should not cause a segmentation fault. I've attached the pgpool logs for the node that crashed and the core dump. I do have logging from the other nodes as well, if required. The crash happens at 01:01:00Z.</div><div><br></div><div>#0 connect_backend (sp=0x55803eb0a6b8, frontend=0x55803eb08768) at protocol/child.c:1076<br>#1 0x000055803ce3d02a in get_backend_connection (frontend=0x55803eb08768) at protocol/child.c:2112<br>#2 0x000055803ce38fd5 in do_child (fds=0x55803eabea90) at protocol/child.c:416<br>#3 0x000055803cdfea4c in fork_a_child (fds=0x55803eabea90, id=13) at main/pgpool_main.c:863<br>#4 0x000055803cdfde30 in PgpoolMain (discard_status=0 '\000', clear_memcache_oidmaps=0 '\000') at main/pgpool_main.c:561<br>#5 0x000055803cdfb9e6 in main (argc=2, argv=0x7ffc8cdddda8) at main/main.c:365<br></div><div><br></div><div>Best regards,</div><div>Emond</div></div>
</blockquote></div></div>