[pgpool-hackers: 3967] Re: Health check doesn’t fail on storage failure and detach_false_primary doesn’t work properly with one remaining node

Wed Jul 14 10:51:10 JST 2021

> For Test 1, I would like to understand if it is as per the design of pgpool to not detect storage issues with a node in health check.

What Pgpool-II's health check is doing are:

1) connect to postmaster
2) postmaster forks off a postgres process
3) pgpool sends a startup packet over the connection to postgres
4) postgres does authentication
5) pgpool replies with password
6) postgres returns that authentication succeeds

If postmaster/postgres does not access storage through #1 to #6
(because all files they need are already on the memory), the health
check succeeds.

Thus I think pgpool's health check works as expected.

> If pgpool checks the health of postgresql nodes via a ping, maybe we could have that ping write something to a test db to make sure the instance is actually reachable and working?

No, pgpool does not use ping command.

> Or at the least, we could add a timeout for queries running via pgpool so that if something like this occurs, the query can terminate automatically after a few seconds.

You can always use PostgresQL's statement_timeout.

> For Test 2, if this failover happened because of the failing online recovery, is there something I can do to prevent it in the future, some configuration change or anything?

For Test 2, I am not sure pgpool is working as expected (smell of
bugs). Can you enable debug log like DEBUG1 and take logs so that I
can get more detailed info?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp