[pgpool-general: 1896] Re: Pgpool is unable to connect backend PostgreSQL
Tatsuo Ishii
ishii at postgresql.org
Tue Jul 16 11:57:01 JST 2013
Ok. I think I finally understand what's going on here.
Pgpool main process (14317) started health checking at Jul 12 09:17:04.
Jul 12 09:17:04 purple1-node1-ps pgpool[14317]: starting health checking
Pgpool main process set timer at 09:17:14 because you set
health_check_timeout 10. This time the health check successfully
completed. The timer for 09:17:14 is blocked by calling
signal(SIGALRM, SIG_IGN).
Unfortunately child life time was expired at 09:17:14 and pgpool main
process was busy at the time because of this.
Jul 12 09:17:14 purple1-node1-ps pgpool[16789]: child life 300 seconds expired
Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: reap_handler called
Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: starting health checking
Pgpool main re-enabled the timer and reset the timer variable
(health_check_timer_expired = 0). But when the timer re-enabled, the
signal handler for the timer set health_check_timer_expired to 1. As
a result pgpool thought that health check timer was expired.
Jul 12 09:17:14 purple1-node1-ps pgpool[14317]: health_check: health check timer has been already expired before attempting to connect to 0 th backend
Thus failover happend even if the backend was running fine.
Besides this,
>> This seems very strange. The error comes here:
I can now think of an explanation. When child life time is expired,
pgpool fork off new process. If the global variable
health_check_timer_expired to 1 for the reason above, it is possible
the problem you saw could happen because child process inherits this.
Includes patch addresses the problems above. Could you try it out?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
> Hi, thanks for your reply.
>
>>> What kind of pgpool installation are you using?
> I comlile pgpool from source code, pgpool-II-3.2.3.tar.gz
>>> What kind of platform are you using?
> We use Ubuntu 12.04 on HP Cloud server.
>
>>> How is like your pgpool.conf?
> Attaching pgpool.conf
>
> Also attached syslog file from the time of a problem. You can look for a
> line
> "Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: I am 11465 accept fd 7"
>
> By the way postgres was up and running at that time, there is nothing in
> the logs from it and its procceses uptime shows week ago...
>
> Thanks in advance for your help
> Larisa.
>
>
>
> On Mon, Jul 15, 2013 at 5:00 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>
>> > Hi
>> > I am hitting the same issue as described in the mail [pgpool-general:
>> 1815]
>> > Pgpool is unable to connect backend PostgreSQL.
>>
>> I guess [pgpool-general: 1815] is different from you (my guess is the
>> case somewhat related to Amazon EC2 environment problem). Moreover,
>> you case seems very unique and strange.
>>
>> > While connected to a single postgres node, after a while pgpool looses
>> > connection to a running postgres db, restarts all children processes and
>> > stays in running state unable to connect to db.
>> >
>> > Pgpool version 3.2.3
>> > Postgres version 9.2.4
>> >
>> > Part of the log:
>> > --------------------
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection received:
>> > host=10.4.225.120 port=41090
>>
>> Process 11465 is a pgpool child process and is responsible for actual
>> pgpool functions.
>>
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: Protocol Major: 3 Minor:
>> 0
>> > database: hpadb user: hpauser
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:
>> connecting
>> > 0 backend
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]:
>> > connect_inet_domain_socket_by_port: health check timer expired
>>
>> This seems very strange. The error comes here:
>>
>> if (health_check_timer_expired) /* has health
>> check timer expired */
>> {
>> pool_log("connect_inet_domain_socket_by_port:
>> health check timer expired");
>> close(fd);
>> return -1;
>> }
>>
>> "health_check_timer_expired" is a global variable used in pgpool main
>> process, which is responsible for management of pgpool, including:
>> health check, failover etc. The variable is only meaningful in the
>> main process and should not be set to non 0 in pgpool child. Moreover,
>> the only place set the variable to non 0 is the signal handler which
>> is set by main process.
>>
>> One the error occurs, pgpool starts failover as you see.
>>
>> I've never seen this kind of report before. What kind of pgpool
>> installation are you using? (compiled from source code or from
>> packes?) What kind of platform are you using? How is like your
>> pgpool.conf?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: connection to
>> > purple1_node1_ps(5432) failed
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: new_connection:
>> create_cp()
>> > failed
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11465]: degenerate_backend_set: 0
>> > fail over request from pid 11465
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler called
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler:
>> starting
>> > to select new master node
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: starting degeneration.
>> > shutdown host purple1_node1_ps(5432)
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: no
>> valid
>> > DB node found
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: Restart all children
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 4388
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 9597
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[18648]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4388]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 85 messages
>> > from pid 9597 due to rate-limiting
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[9597]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 18648
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29409]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 29409
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 11454
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14323]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[11454]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 14323
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 22349
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[22349]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 23617
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 29410
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[31511]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[29410]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[14317]: failover_handler: kill
>> 31511
>> > Jul 12 09:32:14 purple1-node1-ps pgpool[4385]: child received shutdown
>> > request signal 3
>> > Jul 12 09:32:14 purple1-node1-ps rsyslogd-2177: imuxsock lost 757
>> messages
>> > from pid 23617 due to rate-limiting
>> >
>> > Could you please explain?
>> > Thanks
>> > Larisa.
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: health_check.patch
Type: text/x-patch
Size: 2460 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20130716/56936024/attachment.bin>
More information about the pgpool-general
mailing list