[pgpool-hackers: 4396] Re: [pgpool-committers: 9566] pgpool: Test: attempt to fix 001.load_balance test timeout.
Tatsuo Ishii
ishii at sraoss.co.jp
Sat Sep 16 17:29:03 JST 2023
This looks working except this:
> Subject: [pgpool-buildfarm: 2933] Pgpool-II buildfarm results CentOS7
> From: buildfarm at pgpool.net
> To: pgpool-buildfarm at pgpool.net
> Date: Fri, 15 Sep 2023 16:35:15 +0900
> Sender: "pgpool-buildfarm" <pgpool-buildfarm-bounces at pgpool.net>
> User-Agent: Heirloom mailx 12.5 7/5/10
>
> =========================================================================
> * master PostgreSQL 12 CentOS7
> testing 001.load_balance...timeout.
This time the test timed out. Pgpool log reveals the cause:
2023-09-14 05:43:55.006: pcp_main pid 9183: LOG: restart request received in pcp child process
2023-09-14 05:43:55.006: pcp_main pid 9183: DEBUG: shmem_exit(-1): 2 callbacks to make
2023-09-14 05:43:55.006: pcp_main pid 9183: DEBUG: proc_exit(-1): 0 callbacks to make
2023-09-14 05:43:55.008: main pid 9047: LOG: PCP child 9183 exits with status 0 in failover()
2023-09-14 05:43:55.009: main pid 9047: LOG: fork a new PCP child pid 9326 in failover()
2023-09-14 05:43:55.009: main pid 9047: LOG: SIGINT is member
2023-09-14 05:43:55.009: main pid 9047: LOG: exit handler called (signal: 2)
2023-09-14 05:43:55.009: main pid 9047: LOG: shutting down by signal 2
2023-09-14 05:43:55.009: main pid 9047: LOG: terminating all child processes
2023-09-14 05:43:55.009: pcp_main pid 9326: DEBUG: I am PCP child with pid:9326
2023-09-14 05:43:55.009: child pid 9141: DEBUG: shmem_exit(-1): 0 callbacks to make
Also this is src/test/regression/log/001.load_balance:
tcp 0 0 0.0.0.0:11000 0.0.0.0:* LISTEN 9326/pgpool:
tcp6 0 0 :::11000 :::* LISTEN 9326/pgpool:
tcp 0 0 0.0.0.0:11000 0.0.0.0:* LISTEN 9326/pgpool:
tcp6 0 0 :::11000 :::* LISTEN 9326/pgpool:
tcp 0 0 0.0.0.0:11000 0.0.0.0:* LISTEN 9326/pgpool:
tcp6 0 0 :::11000 :::* LISTEN 9326/pgpool:
:
:
(test timeout)
So apprently pid 9326 which was created here:
2023-09-14 05:43:55.009: main pid 9047: LOG: fork a new PCP child pid 9326 in failover()
remained and this is the cause of timeout error. Question is, why pcp child did not exit?
>From src/pcp_con/pcp_child.c:
ereport(DEBUG1,
(errmsg("I am PCP child with pid:%d", getpid())));
ereport(LOG,
(errmsg("PCP process: %d started", getpid())));
for (;;)
{
My guess is process 9326's exit handler was called after "I am PCP child with pid: 9326" log message.
At this point exit handler was already set up:
pool_signal(SIGTERM, pcp_exit_handler);
pool_signal(SIGINT, pcp_exit_handler);
pool_signal(SIGQUIT, pcp_exit_handler);
In the exit handler:
if (list_length(pcp_worker_children) > 0)
{
do
{
wpid = wait(NULL);
} while (wpid > 0 || (wpid == -1 && errno == EINTR));
list_free(pcp_worker_children);
}
My guess is the exit handler stuck somewhere in the the loop. I don't
know why it happend. Probably at this point no child was exiting?
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
From: Tatsuo Ishii <ishii at sraoss.co.jp>
Subject: [pgpool-committers: 9566] pgpool: Test: attempt to fix 001.load_balance test timeout.
Date: Wed, 13 Sep 2023 09:17:42 +0000
Message-ID: <E1qgM0E-002UR1-IY at gothos.postgresql.org>
> Test: attempt to fix 001.load_balance test timeout.
>
> Examining the log reveals that the error was caused by bind error:
> address already used. This is strange because SO_REUSEADDR flags is
> set to the listening socket. I guess the cause is, the kernel is not
> fast enough to clean up the socket used by previous pgpool process.
> To fix this, existing CHECK_TIME_WAIT environment variable parameter
> is modified so that it checks the socket by using netstat -tulpn,
> rather than netstat -a, which is too slow.
>
> Branch
> ------
> master
>
> Details
> -------
> https://git.postgresql.org/gitweb?p=pgpool2.git;a=commitdiff;h=fb0800e3f0211ba833ca05cf506387e4d2a279b8
>
> Modified Files
> --------------
> src/test/pgpool_setup.in | 2 +-
> src/test/regression/tests/001.load_balance/test.sh | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
More information about the pgpool-hackers
mailing list