[pgpool-general: 1950] Re: pgpool 3.2.5 watchdog ifconfig down always hangs
Jeff Frost
jeff at pgexperts.com
Sat Jul 27 04:50:04 JST 2013
A quick rebuild and here we go:
jeff at squeeze:/usr/local/pgpool2$ ps -ef|grep pgpool
postgres 13073 1 0 12:35 pts/1 00:00:00 logger -t pgpool -p local0.info
postgres 13098 1 0 12:35 pts/1 00:00:00 pgpool: watchdog
postgres 13099 1 0 12:35 pts/1 00:00:00 pgpool: lifecheck
sudo gdb -p 13098
(gdb) bt
#0 0x00007f492036b3e3 in select () from /lib/libc.so.6
#1 0x0000000000478d98 in wd_accept (sock=<value optimized out>) at
wd_packet.c:302
#2 0x0000000000477137 in wd_child (fork_wait_time=1) at wd_child.c:91
#3 0x0000000000476db5 in wd_main (fork_wait_time=1) at watchdog.c:127
#4 0x00000000004086ec in main (argc=<value optimized out>, argv=<value
optimized out>) at main.c:632
sudo gdb -p 13098
(gdb) bt
#0 0x00007f4920ef9b33 in wait () from /lib/libpthread.so.0
#1 0x00000000004776bc in exec_ifconfig (path=0x7fffde882fe0 "/usr/bin/sudo",
command=<value optimized out>) at wd_if.c:191
#2 0x0000000000477843 in wd_IP_down () at wd_if.c:79
#3 0x0000000000479699 in wd_notice_server_down () at wd_packet.c:119
#4 0x0000000000476f30 in wd_exit (exit_signo=2) at watchdog.c:75
#5 <signal handler called>
#6 0x00007f4920342c5d in nanosleep () from /lib/libc.so.6
#7 0x00007f4920342ad0 in sleep () from /lib/libc.so.6
#8 0x0000000000476e87 in wd_main (fork_wait_time=1) at watchdog.c:160
#9 0x00000000004086ec in main (argc=<value optimized out>, argv=<value
optimized out>) at main.c:632
jeff at squeeze:/usr/local/pgpool2$ ps -ef|grep ifconfig
root 2220 2121 0 12:45 pts/1 00:00:00 sudo ifconfig eth0:1
10.10.10.28 netmask 255.255.255.0 down
root 2221 2220 0 12:45 pts/1 00:00:00 [ifconfig] <defunct>
jeff at squeeze:/usr/local/pgpool2$ sudo gdb -p 2220
(gdb) bt
#0 0x00007f1b412563c3 in select () from /lib/libc.so.6
#1 0x0000000000409a23 in sudo_execve ()
#2 0x000000000040e463 in run_command ()
#3 0x000000000040fce0 in main ()
jeff at squeeze:/usr/local/pgpool2$ sudo gdb -p 2221
Attaching to process 2221
ptrace: Operation not permitted.
(gdb) bt
No stack.
An important note is that the ifconfig down is actually successful even though
it doesn't return - that is, the eth0:1 interface goes away.
So it looks like this:
Jul 26 12:45:05 squeeze pgpool: 2013-07-26 12:45:05 ERROR: pid 2094:
find_primary_node: make_persistent_connection failed
Jul 26 12:45:05 squeeze pgpool: 2013-07-26 12:45:05 LOG: pid 2094: received
fast shutdown request
Jul 26 12:45:05 squeeze pgpool: 2013-07-26 12:45:05 LOG: pid 2094:
watchdog_pid: 2121
Jul 26 12:45:25 squeeze pgpool: 2013-07-26 12:45:25 ERROR: pid 2094: wait()
failed. reason:Interrupted system call
Till I kill -9 the sudo process:
jeff at squeeze:/usr/local/pgpool2$ sudo kill -9 2220
Then these two log lines are emitted:
Jul 26 12:48:59 squeeze pgpool: 2013-07-26 12:48:59 ERROR: pid 2121:
wd_IP_down: ifconfig down failed
Jul 26 12:49:02 squeeze pgpool: 2013-07-26 12:49:02 LOG: pid 2121:
wd_create_send_socket: connect() reports failure (No route to host). You can
safely ignore this while starting up.
On 07/26/13 12:30, Jeff Frost wrote:
> More info:
>
> Here is a syslog snippet:
>
> |Jul 26 12:06:33 pgpool01 pgpool: 2013-07-26 12:06:33 LOG: pid 12847: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> Jul 26 12:06:49 pgpool01 pgpool: 2013-07-26 12:06:49 LOG: pid 13243: wd_chk_sticy: all commands have sticky bit
> Jul 26 12:06:49 pgpool01 pgpool: 2013-07-26 12:06:49 LOG: pid 13243: watchdog might call network commands which using sticky bit.
> Jul 26 12:06:49 pgpool01 pgpool: 2013-07-26 12:06:49 LOG: pid 13243: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> Jul 26 12:06:52 pgpool01 pgpool: 2013-07-26 12:06:52 LOG: pid 13243: wd_escalation: escalated to master pgpool
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG: pid 13243: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG: pid 13243: wd_escalation: escaleted to delegate_IP holder
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG: pid 13243: wd_init: start watchdog
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG: pid 13243: pgpool-II successfully started. version 3.2.4 (namameboshi)
> Jul 26 12:06:54 pgpool01 pgpool: 2013-07-26 12:06:54 LOG: pid 13243: find_primary_node: primary node id is 0
> Jul 26 12:10:13 pgpool01 pgpool: 2013-07-26 12:10:13 LOG: pid 13243: received fast shutdown request
> Jul 26 12:10:13 pgpool01 pgpool: 2013-07-26 12:10:13 LOG: pid 13243: watchdog_pid: 13257
> *Jul 26 12:10:53 pgpool01 pgpool: 2013-07-26 12:10:53 ERROR: pid 13257: wd_IP_down: ifconfig down failed*
> Jul 26 12:10:53 pgpool01 pgpool: 2013-07-26 12:10:53 LOG: pid 13257: wd_create_send_socket: connect() reports failure (Connection refused). You can safely ignore this while starting up.
> Jul 26 12:11:19 pgpool01 pgpool: 2013-07-26 12:11:19 LOG: pid 13613: wd_chk_sticy: all commands have sticky bit|
>
> I also attached 2 straces - one was the pgpool process as I issued a stop,
> and the other was the watchdog process.
>
> Unfortunately, it doesn't look like apt.postgresql.org provides dbg files,
> so I'll have to rebuild pgpool2 so I can get the symbols for the backtraces.
>
>
> On 07/26/13 07:48, Jeff Frost wrote:
>> Yes, you can see the pgpool processes stuck in my ps output below.
>>
>> They happily exit once I kill -9 the sudo process.
>>
>> I'll see if I can get some stack traces but if you can't reproduce on Ubuntu or CentOS, I suspect it's something with Debian Squeeze's sudo or ifconfig commands.
>>
>> On Jul 26, 2013, at 3:24 AM, Yugo Nagata <nagata at sraoss.co.jp> wrote:
>>
>>> Hi,
>>>
>>> Does pgpool hang as well as ifconfig when it is stopped?
>>> I cannot reproduce this on CentOS and Ubuntu. Both pgpool and
>>> ifconfig stops normally.
>>>
>>> Could you please provide me the stack trace of hanging pgpool and
>>> log msessages?
>>>
>>>
>>> On Thu, 25 Jul 2013 09:56:36 -0700
>>> Jeff Frost <jeff at pgexperts.com> wrote:
>>>
>>>> This seems to be the same on 3.2.3, 3.2.4 and 3.2.5.
>>>>
>>>> The watchdog section of pgpool.conf looks like this:
>>>>
>>>> use_watchdog = on
>>>> delegate_IP = '10.100.2.72'
>>>> wd_hostname = '10.100.2.70'
>>>> wd_port = 9000
>>>> ifconfig_path = '/usr/bin'
>>>> arping_path = '/usr/bin'
>>>> if_up_cmd = 'sudo ifconfig eth0:1 $_IP_$ netmask 255.255.255.0 up'
>>>> if_down_cmd = 'sudo ifconfig eth0:1 $_IP_$ netmask 255.255.255.0 down'
>>>> arping_cmd = 'sudo arping -U $_IP_$ -w 1'
>>>> wd_interval = 3
>>>> other_pgpool_hostname0 = '10.100.2.71'
>>>> other_pgpool_port0 = 9999
>>>> other_wd_port0 = 9000
>>>>
>>>> virtual IP starts up great and properly moves to the secondary pgpool server
>>>> if you stop pgpool. However, the ifconfig becomes defunct and never exits
>>>> requiring a kill -9:
>>>>
>>>> jeff at pgpool01:/tmp/pgpool$ ps -ef|grep pgpool
>>>> postgres 19974 1 0 09:51 pts/0 00:00:00 /tmp/pgpool/bin/pgpool -n
>>>> postgres 19975 1 0 09:51 pts/0 00:00:00 logger -t pgpool -p local0.info
>>>> postgres 19978 19974 0 09:51 pts/0 00:00:00 pgpool: watchdog
>>>> postgres 19979 19974 0 09:51 pts/0 00:00:00 pgpool: lifecheck
>>>> jeff 20735 1615 0 09:54 pts/0 00:00:00 grep pgpool
>>>>
>>>> jeff at pgpool01:/tmp/pgpool$ ps -ef|grep ifconfig
>>>> root 20439 19979 0 09:52 pts/0 00:00:00 sudo ifconfig eth0:1
>>>> 10.100.2.72 netmask 255.255.255.0 down
>>>> root 20440 20439 0 09:52 pts/0 00:00:00 [ifconfig] <defunct>
>>>> jeff 20737 1615 0 09:54 pts/0 00:00:00 grep ifconfig
>>>>
>>>> System is Debian Squeeze. Any idea how to fix this? kill -9 of the sudo
>>>> allows pgpool to exit.
>>>>
>>>> --
>>>> Jeff Frost <jeff at pgexperts.com>
>>>> CTO, PostgreSQL Experts, Inc.
>>>> Phone: 1-888-PG-EXPRT x506
>>>> FAX: 415-762-5122
>>>> http://www.pgexperts.com/
>>>>
>>>> _______________________________________________
>>>> pgpool-general mailing list
>>>> pgpool-general at pgpool.net
>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> --
>>> Yugo Nagata <nagata at sraoss.co.jp>
>> ---
>> Jeff Frost <jeff at pgexperts.com>
>> CTO, PostgreSQL Experts, Inc.
>> Phone: 1-888-PG-EXPRT x506
>> FAX: 415-762-5122
>> http://www.pgexperts.com/
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> pgpool-general mailing list
>> pgpool-general at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
>
> --
> Jeff Frost <jeff at pgexperts.com>
> CTO, PostgreSQL Experts, Inc.
> Phone: 1-888-PG-EXPRT x506
> FAX: 415-762-5122
> http://www.pgexperts.com/
>
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
--
Jeff Frost <jeff at pgexperts.com>
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20130726/aaaec8e0/attachment.htm>
More information about the pgpool-general
mailing list