[pgpool-hackers: 3252] Re: Segfault in a race condition
Yugo Nagata
nagata at sraoss.co.jp
Mon Feb 25 19:01:41 JST 2019
Hi,
I found another race condition in 3.6.15 causing a segfault, which is
reported by our customer.
On Tue, 08 Jan 2019 17:04:00 +0900 (JST)
Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> I found a segfault could happen in a race condition:
>
> 1) frontend tries to connect to Pgpool-II
>
> 2) there's no existing connection cache
>
> 3) try to create new backend connections by calling connect_backend()
>
> 4) inside connect_backend(), pool_create_cp() gets called
>
> 5) pool_create_cp() calls new_connection()
>
> 6) failover occurs and the global backend status is set to down, but
> the pgpool main does not send kill signal to the child process yet
>
> 7) inside new_connection() after checking VALID_BACKEND, it checks the
> global backend status and finds it is set to down status, so that
> it returns without creating new connection slot
>
> 8) connect_backend() continues and accesses the downed connection slot
> because local status says it's alive, which results in a segfault.
The situation is almost the same to above except that the segfault
occurs in pool_do_auth(). (See backtrace and log below)
I guess pool_do_auth was called before Req_info->master_node_id was updated
in failover(), so MASTER_CONNECTION(cp) was referring the downed connection
and MASTER_CONNECTION(cp)->sp caused the segfault.
Here is the backtrace from core:
=================================
Core was generated by `pgpool: accept connection '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000041b993 in pool_do_auth (frontend=0x1678f28, cp=0x1668f18)
at auth/pool_auth.c:77
77 protoMajor = MASTER_CONNECTION(cp)->sp->major;
Missing separate debuginfos, use: debuginfo-install libmemcached-0.31-1.1.el6.x86_64
(gdb) bt
#0 0x000000000041b993 in pool_do_auth (frontend=0x1678f28, cp=0x1668f18)
at auth/pool_auth.c:77
#1 0x000000000042377f in connect_backend (sp=0x167ae78, frontend=0x1678f28)
at protocol/child.c:954
#2 0x0000000000423fdd in get_backend_connection (frontend=0x1678f28)
at protocol/child.c:2396
#3 0x0000000000424b94 in do_child (fds=0x16584f0) at protocol/child.c:337
#4 0x000000000040682d in fork_a_child (fds=0x16584f0, id=372)
at main/pgpool_main.c:758
#5 0x0000000000409941 in failover () at main/pgpool_main.c:2102
#6 0x000000000040cb40 in PgpoolMain (discard_status=<value optimized out>,
clear_memcache_oidmaps=<value optimized out>) at main/pgpool_main.c:476
#7 0x0000000000405c44 in main (argc=<value optimized out>,
argv=<value optimized out>) at main/main.c:317
(gdb) l
72 int authkind;
73 int i;
74 StartupPacket *sp;
75
76
77 protoMajor = MASTER_CONNECTION(cp)->sp->major;
78
79 kind = pool_read_kind(cp);
80 if (kind < 0)
81 ereport(ERROR,
=======================================-
Here is a snippet of the pgpool log. PID 5067 has a segfault.
==================
(snip)
2019-02-23 18:41:35:MAIN(2743):[No Connection]:[No Connection]: LOG: starting degeneration. shutdown host xxxxxxxx(xxxx)
2019-02-23 18:41:35:MAIN(2743):[No Connection]:[No Connection]: LOG: Restart all children
2019-02-23 18:41:35:CHILD(5067):[No Connection]:[No Connection]: LOG: new connection received
2019-02-23 18:41:35:CHILD(5067):[No Connection]:[No Connection]: DETAIL: connecting host=xxxxxx port=xxxx
(snip)
2019-02-23 18:41:37:MAIN(2743):[No Connection]:[No Connection]: LOG: child process with pid: 5066 exits with status 0
2019-02-23 18:41:37:MAIN(2743):[No Connection]:[No Connection]: LOG: child process with pid: 5066 exited with success and will not be restarted
2019-02-23 18:41:37:MAIN(2743):[No Connection]:[No Connection]: WARNING: child process with pid: 5067 was terminated by segmentation fault
2019-02-23 18:41:37:MAIN(2743):[No Connection]:[No Connection]: LOG: child process with pid: 5067 exited with success and will not be restarted
2019-02-23 18:41:37:MAIN(2743):[No Connection]:[No Connection]: LOG: child process with pid: 5068 exits with status 0
2019-02-23 18:41:37:MAIN(2743):[No Connection]:[No Connection]: LOG: child process with pid: 5068 exited with success and will not be restarted
(snip)
===================
Regards,
--
Yugo Nagata <nagata at sraoss.co.jp>
More information about the pgpool-hackers
mailing list