[pgpool-hackers: 3466] Re: health check timeout does work in certain case
Tatsuo Ishii
ishii at sraoss.co.jp
Mon Oct 21 13:55:44 JST 2019
Fix committed in 3.7 and above.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
From: Tatsuo Ishii <ishii at sraoss.co.jp>
Subject: [pgpool-hackers: 3458] health check timeout does work in certain case
Date: Wed, 16 Oct 2019 15:09:10 +0900 (JST)
Message-ID: <20191016.150910.544077412377095882.t-ishii at sraoss.co.jp>
> I have been playing with health check and found that it does not work in certan case.
>
> I sent SIGSTOP to one of backend node's postmaster process to freeze
> it. I was expecting health check process detects it with health check
> timer expired. However the health check process wait forever here:
>
> (gdb) bt
> #0 0x00007f094a7a234e in __libc_read (fd=6,
> buf=buf at entry=0x564a3aa3a2c0 <readbuf>, nbytes=nbytes at entry=1024)
> at ../sysdeps/unix/sysv/linux/read.c:27
> #1 0x0000564a3a68dd70 in read (__nbytes=1024, __buf=0x564a3aa3a2c0 <readbuf>,
> __fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
> #2 pool_read (cp=cp at entry=0x7f094adf2268, buf=buf at entry=0x7fff6c221786,
> len=len at entry=1) at utils/pool_stream.c:194
> #3 0x0000564a3a68e101 in pool_read_with_error (cp=0x7f094adf2268,
> buf=buf at entry=0x7fff6c221786, len=len at entry=1,
> err_context=err_context at entry=0x564a3a700c90 "authentication message response type") at utils/pool_stream.c:141
> #4 0x0000564a3a649761 in connection_do_auth (cp=cp at entry=0x564a3c22a640,
> password=password at entry=0x564a3c22a5f0 "md5a16f9d87e344969ec59de417447348b3") at auth/pool_auth.c:104
> #5 0x0000564a3a6565e8 in make_persistent_db_connection (
> db_node_id=db_node_id at entry=1,
> hostname=hostname at entry=0x7f094ae0b280 "/tmp", port=port at entry=11003,
> dbname=dbname at entry=0x564a3c21a4a8 "postgres",
> user=user at entry=0x564a3c21b7a8 "t-ishii",
> password=password at entry=0x564a3c22a5f0 "md5a16f9d87e344969ec59de417447348b3", retry=0 '\000') at protocol/child.c:1440
> #6 0x0000564a3a65670d in make_persistent_db_connection_noerror (
> db_node_id=db_node_id at entry=1,
> ---Type <return> to continue, or q <return> to quit
>
> The stack #2 is here in pool_stream.c:
>
> readlen = read(cp->fd, readbuf, READBUFSZ);
>
> Actually read(2) was once interrupted by ALARM as expected but later
> on it called read(2) again and stuck there this time because of this
> code.
>
> if (errno == EINTR || errno == EAGAIN)
> {
> ereport(DEBUG5,
> (errmsg("read on socket failed with error :\"%s\"", strerror(errno)),
> errdetail("retrying...")));
> continue;
> }
>
> As far as I remember, in all cases except health check read(2) should
> retry and I would like to propose attached patch to fix the
> issue. Comments are welcome.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
More information about the pgpool-hackers
mailing list