[pgpool-general: 2783] Child process hangs in active state
Junegunn Choi
junegunn.c at gmail.com
Mon Apr 21 15:28:52 JST 2014
Hi, we've run into a very strange situation where a child process hangs in
active state.
*** Versions
- os: RHEL 5.7
- pgpool: 3.3.2
- backend: EnterpriseDB 9.3.1.3 on x86_64
- edb jdbc driver
*** show pool_pools;
pool_pid | start_time | pool_id | backend_id | database |
username | create_time | majorversion | minorversion |
pool_counter | pool_backendpid | pool_connected
----------+---------------------+---------+------------+----------+--------------+---------------------+--------------+--------------+--------------+-----------------+----------------
26462 | 2014-04-18 17:10:58 | 0 | 0 | cloud |
enterprisedb | 2014-04-18 17:10:58 | 3 | 0 | 11518
| 20546 | 1
26462 | 2014-04-18 17:10:58 | 0 | 1 | cloud |
enterprisedb | 2014-04-18 17:10:58 | 3 | 0 | 11518
| 29160 | 1
26462 | 2014-04-18 17:10:58 | 1 | 0 | |
| | 0 | 0 | 0 | 0
| 0
26462 | 2014-04-18 17:10:58 | 1 | 1 | |
| | 0 | 0 | 0 | 0
| 0
26462 | 2014-04-18 17:10:58 | 2 | 0 | |
| | 0 | 0 | 0 | 0
| 0
26462 | 2014-04-18 17:10:58 | 2 | 1 | |
| | 0 | 0 | 0 | 0
| 0
26462 | 2014-04-18 17:10:58 | 3 | 0 | |
| | 0 | 0 | 0 | 0
| 0
26462 | 2014-04-18 17:10:58 | 3 | 1 | |
| | 0 | 0 | 0 | 0
| 0
*** pg_stat_activity
edb=# select * from pg_stat_activity where pid = 20546;
datid | datname | pid | usesysid | usename | application_name |
client_addr | client_hostname | client_port | backend_start
| xact_start | query_
start | state_change | waiting | state |
query
-------+---------+-------+----------+--------------+------------------+--------------+-----------------+-------------+----------------------------------+---------------------------------+-----------------
----------------+----------------------------------+---------+--------+----------------------------
16391 | cloud | 20546 | 10 | enterprisedb | |
10.12.240.94 | | 30104 | 18-APR-14 17:10:58.329551
+09:00 | 18-APR-14 17:14:36.83873 +09:00 | 18-APR-14 17:14:
36.83873 +09:00 | 18-APR-14 17:14:36.838732 +09:00 | f | active | SET
extra_float_digits = 3
*** ps output (pgpool server)
pgpool 32395 0.0 0.0 24160 4848 ? S Apr15 0:11 \_
/db/pgpool/bin/pgpool -n -D -f /db/pgpool/conf/pgm/pgpool.conf -F
/db/pgpool/conf/pgm/pcp.conf -a /db/pgpool/conf/pgm/pool_hba.conf
pgpool 18534 0.0 0.0 24160 704 ? S Apr18 0:00 | \_
pgpool: PCP: wait for connection request
pgpool 18535 0.0 0.0 24160 1076 ? S Apr18 0:03 | \_
pgpool: worker process
pgpool 26462 0.0 0.0 830632 9032 ? S Apr18 0:10 | \_
pgpool: enterprisedb cloud ---------(redacted)---------(15878) idle
pgpool 11259 0.0 0.0 24160 876 ? S 13:31 0:00 | \_
pgpool: wait for connection request
pgpool 11261 0.0 0.0 24160 864 ? S 13:31 0:00 | \_
pgpool: wait for connection request
pgpool 11262 0.0 0.0 24160 864 ? S 13:31 0:00 | \_
pgpool: wait for connection request
pgpool 11263 0.0 0.0 24160 864 ? S 13:31 0:00 | \_
pgpool: wait for connection request
...
*** ps output (backend)
$ ps auxf | grep 20546
hanadmin 12046 0.0 0.0 61188 744 pts/8 S+ 15:07 0:00
\_ grep 20546
541 20546 0.0 0.0 8761172 7448 ? Ss Apr18 0:07 \_
postgres: enterprisedb cloud 10.12.240.94[30104] PARSE
*** gdb trace
(gdb) bt
#0 0x00000033308cd6d3 in __select_nocancel () from /lib64/libc.so.6
#1 0x0000000000418c71 in pool_check_fd (cp=<value optimized out>) at
pool_process_query.c:951
#2 0x000000000041d0e4 in pool_read2 (cp=0xbadfc70, len=822083579) at
pool_stream.c:264
#3 0x0000000000418803 in SimpleForwardToFrontend (kind=49 '1',
frontend=0xbaf1180, backend=0xbad8330) at pool_process_query.c:1136
#4 0x00000000004499bb in ProcessBackendResponse (frontend=0xbaf1180,
backend=0xbad8330, state=0x7fff94626060, num_fields=0x7fff94626066) at
pool_proto_modules.c:2670
#5 0x000000000041b69d in pool_process_query (frontend=0xbaf1180,
backend=0xbad8330, reset_request=0) at pool_process_query.c:289
#6 0x000000000040ad8a in do_child (unix_fd=4, inet_fd=5) at child.c:355
#7 0x000000000040456f in fork_a_child (unix_fd=4, inet_fd=5, id=0) at
main.c:1258
#8 0x0000000000404897 in reaper () at main.c:2482
#9 0x0000000000404c25 in pool_sleep (second=<value optimized out>) at
main.c:2679
#10 0x0000000000407a0a in main (argc=<value optimized out>, argv=<value
optimized out>) at main.c:856
(gdb) up
#1 0x0000000000418c71 in pool_check_fd (cp=<value optimized out>) at
pool_process_query.c:951
951 fds = select(fd+1, &readmask, NULL, &exceptmask,
timeoutp);
(gdb)
#2 0x000000000041d0e4 in pool_read2 (cp=0xbadfc70, len=822083579) at
pool_stream.c:264
264 if (pool_check_fd(cp))
(gdb)
#3 0x0000000000418803 in SimpleForwardToFrontend (kind=49 '1',
frontend=0xbaf1180, backend=0xbad8330) at pool_process_query.c:1136
1136 p = pool_read2(CONNECTION(backend,
i), len);
(gdb) p len
$1 = 822083580
(gdb) p *backend->slots[i].con
$3 = {fd = 10, wbuf = 0xbae01d0 "P", wbufsz = 8192, wbufpo = 0, ssl_active
= -1, hp = 0xbae21e0 "1", po = 0, bufsz = 1024, len = 0, sbuf = 0x0, sbufsz
= 0, buf2 = 0x2aaf32c92010 "\004OMMIT",
bufsz2 = 822083584, buf3 = 0x0, bufsz3 = 0, isbackend = 1, db_node_id =
1, tstate = 73 'I', is_internal_transaction_started = 0 '\000', auth_kind =
0, pwd_size = 0,
password = '\000' <repeats 1023 times>, salt = "\000\000\000", params =
{num = 0, names = 0xbae25f0, values = 0xbae2a00}, no_forward = 0, kind = 0
'\000', protoVersion = 0, raddr = {addr = {
ss_family = 0, __ss_align = 0, __ss_padding = '\000' <repeats 111
times>}, salen = 0}, auth_method = uaReject, auth_arg = 0x0, database =
0x0, username = 0x0}
***
`len` and `bufsz2` being over 800MB looks extremely odd. Could it be a bug
of pgpool?
The ps output shows that the process alone actually allocated 800MB of
address space.
This has happened several times for us, but we have no idea what caused it
and we just had to kill the process.
Any idea?
--
cheers,
junegunn choi.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20140421/f5bdcdda/attachment.htm>
More information about the pgpool-general
mailing list