[pgpool-general: 9129] Re: Another segmentation fault

Tatsuo Ishii ishii at sraoss.co.jp
Fri Jun 14 09:28:34 JST 2024


> For the second crash (in pool_connection_pool.c:326), all slots are null:
> (gdb) info locals
> i = 0
> freed = 0
> closetime = 0
> oldestp = 0x7f474f36e168
> ret = 0x0
> info = 0x55b511283469 <opt_sort>
> p = 0x7f474f36e168
> 
> (gdb) p *p
> $7 = {
>   info = 0x7f47470d5c08,
>   slots = {0x0 <repeats 128 times>}
> }
> 
> (gdb) p p->slots[0]
> $10 = (POOL_CONNECTION_POOL_SLOT *) 0x0
> (gdb) p p->slots[1]
> $11 = (POOL_CONNECTION_POOL_SLOT *) 0x0
> (gdb) p p->slots[2]
> $12 = (POOL_CONNECTION_POOL_SLOT *) 0x0

Possible explanation is, Req_info->main_node_id, which is the smallest
backend node id number being alive, was -1. From the log file just
before pid 30 segfaults:

2024-05-31T07:16:45.908900+02:00 2024-05-31 07:16:45: pid 1: LOG:  backend:0 is set to down status
2024-05-31T07:16:45.908939+02:00 2024-05-31 07:16:45: pid 1: DETAIL:  backend:0 is DOWN on cluster leader "172.29.30.2:5432 Linux 216dfd5e07f2"
2024-05-31T07:16:45.908987+02:00 2024-05-31 07:16:45: pid 1: LOG:  backend:1 is set to down status
2024-05-31T07:16:45.909017+02:00 2024-05-31 07:16:45: pid 1: DETAIL:  backend:1 is DOWN on cluster leader "172.29.30.2:5432 Linux 216dfd5e07f2"
2024-05-31T07:16:45.909044+02:00 2024-05-31 07:16:45: pid 1: LOG:  backend:2 is set to down status
2024-05-31T07:16:45.909071+02:00 2024-05-31 07:16:45: pid 1: DETAIL:  backend:2 is DOWN on cluster leader "172.29.30.2:5432 Linux 216dfd5

That means all backends were down, and Req_info->main_node_id should
have been -1 at the time. Usually if all backend are down, pgpool will
not accept connections from frontend. I think when the check was
peformed, not all backend are down. Right after this all backend went
down I guess. If so, at line 270 of pool_connection_pool.c:

270                 if (MAIN_CONNECTION(p) == NULL)

MAIN_CONNECTION (that is p->slots[-1] because main_node_id == -1)
looked into garbage and above condition was not met. Then the code
proceeded to line 326:

 326                         pool_free_startup_packet(CONNECTION_SLOT(p, i)->sp);

Here CONNECTION_SLOT(p, i) is NULL and segfaulted.

I think using MAIN_CONNECTION macro to take care of connection
poolings is incorrect. Which backend is alive in the each connection
pool slot could be different from current alive backend. So I invented
a function in_use_backend_id() for seaching live backend ids in a
connection slot. Attached is the patch in this direction. I hope the
patch to fix the second segfault case.

> I hope this helps.
> 
> Best regards,
> Emond
> 
> Op di 4 jun 2024 om 04:56 schreef Tatsuo Ishii <ishii at sraoss.co.jp>:
> 
>> > No worries. I hope you had a good trip. Last night we triggered the last
>> > crash again. Is there anything we can do to make it easier for you to
>> find
>> > the cause?
>>
>> It would be helpful if you could share some variable values in the
>> core file. Since I don't have the pgpool load module when you got the
>> core, I cannot inspect the variables using the core you provided.
>>
>> > #0  connect_backend (sp=0x55803eb0a6b8, frontend=0x55803eb08768) at
>> > protocol/child.c:1076
>> > #1  0x000055803ce3d02a in get_backend_connection
>> (frontend=0x55803eb08768)
>> > at protocol/child.c:2112
>> > #2  0x000055803ce38fd5 in do_child (fds=0x55803eabea90) at
>> > protocol/child.c:416
>> > #3  0x000055803cdfea4c in fork_a_child (fds=0x55803eabea90, id=13) at
>> > main/pgpool_main.c:863
>> > #4  0x000055803cdfde30 in PgpoolMain (discard_status=0 '\000',
>> > clear_memcache_oidmaps=0 '\000') at main/pgpool_main.c:561
>> > #5  0x000055803cdfb9e6 in main (argc=2, argv=0x7ffc8cdddda8) at
>> > main/main.c:365
>>
>> Around protocol/child.c:1076:
>>                                 /* set DB node id */
>>                                 pool_set_db_node_id(CONNECTION(backend,
>> i), i);
>>
>> I want to see the values in the "backend" struct. Since CONNECTION
>> macro is used here, you have to do something like in a gdb session.
>>
>> p *backend->slots[0]
>> p *backend->slots[1]
>> p *backend->slots[2]
>>
>> Best reagards,
>> --
>> Tatsuo Ishii
>> SRA OSS LLC
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_segfault2.patch
Type: text/x-patch
Size: 6192 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20240614/9797eaa4/attachment.bin>


More information about the pgpool-general mailing list