[pgpool-general: 9128] Re: Another segmentation fault

Tue Jun 11 22:09:53 JST 2024

>
> >>> > #0  connect_backend (sp=0x55803eb0a6b8, frontend=0x55803eb08768) at
> >>> > protocol/child.c:1076
> >>> > #1  0x000055803ce3d02a in get_backend_connection
> >>> (frontend=0x55803eb08768)
> >>> > at protocol/child.c:2112
> >>> > #2  0x000055803ce38fd5 in do_child (fds=0x55803eabea90) at
> >>> > protocol/child.c:416
> >>> > #3  0x000055803cdfea4c in fork_a_child (fds=0x55803eabea90, id=13) at
> >>> > main/pgpool_main.c:863
> >>> > #4  0x000055803cdfde30 in PgpoolMain (discard_status=0 '\000',
> >>> > clear_memcache_oidmaps=0 '\000') at main/pgpool_main.c:561
> >>> > #5  0x000055803cdfb9e6 in main (argc=2, argv=0x7ffc8cdddda8) at
> >>> > main/main.c:365
>
> I think I found a possible code path in the first case (crash at
> child.c:1076). Unfortunately I couldn't find a test case to reproduce
> it reliably and my thought is purely theoretical one. Also it seems
> the case heavily depends on timing and probably it's relatively rare
> case. If you find the case very seldom, maybe that's the reason.
>

I think I've only seen this crash once or twice. So, it's indeed very rare.
That does seem to match with your explanation. Given the fact that these
segmentation faults only are not very predictable, seem to suggest a timing
related race condition. The testcase we added some weeks ago does introduce
a lot of noise in the backend status, by rebooting the systems without
proper sequencing and shutdown. It's this testcase that fails most often.

> Attached is the patch to implement above.
>

I'll apply the patch. Thanks for that! Your explanation does make sense.
I'll report back if we see any new crashes.

Best regards,
Emond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20240611/3d6e5695/attachment.htm>