[pgpool-hackers: 3967] Re: Health check doesn’t fail on storage failure and detach_false_primary doesn’t work properly with one remaining node
Tatsuo Ishii
ishii at sraoss.co.jp
Wed Jul 14 10:51:10 JST 2021
> For Test 1, I would like to understand if it is as per the design of pgpool to not detect storage issues with a node in health check.
What Pgpool-II's health check is doing are:
1) connect to postmaster
2) postmaster forks off a postgres process
3) pgpool sends a startup packet over the connection to postgres
4) postgres does authentication
5) pgpool replies with password
6) postgres returns that authentication succeeds
If postmaster/postgres does not access storage through #1 to #6
(because all files they need are already on the memory), the health
check succeeds.
Thus I think pgpool's health check works as expected.
> If pgpool checks the health of postgresql nodes via a ping, maybe we could have that ping write something to a test db to make sure the instance is actually reachable and working?
No, pgpool does not use ping command.
> Or at the least, we could add a timeout for queries running via pgpool so that if something like this occurs, the query can terminate automatically after a few seconds.
You can always use PostgresQL's statement_timeout.
> For Test 2, if this failover happened because of the failing online recovery, is there something I can do to prevent it in the future, some configuration change or anything?
For Test 2, I am not sure pgpool is working as expected (smell of
bugs). Can you enable debug log like DEBUG1 and take logs so that I
can get more detailed info?
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
More information about the pgpool-hackers
mailing list