[pgpool-hackers: 3365] Re: Overhaul node failure debugging aid
Tatsuo Ishii
ishii at sraoss.co.jp
Tue Aug 6 11:32:41 JST 2019
Done.
> check_backend_down_request() in health_check.c is intended to simulate
> the situation where communication failure between health check and
> PostgreSQL backend node by creating a file containing lines:
>
> 1 down
>
> where the first numeric is the node id starting from 0, tab, and
> "down". When health check process finds the file, let health check
> fails on node 1.
>
> After health check brings the node into down status,
> check_backend_down_request() change "down" to "already_down" to
> prevent repeating node failure.
>
> However, questions is, this is necessary at all. I think
> check_backend_down_request() should keep on reporting the down status
> and it should be called inside establish_persistent_connection() to
> prevent repeating node failure because it could be better simulated
> the failing situation in this way. For example, currently the health
> check retry is not simulated but the new way can do it.
>
> Moreover, in current watchdog implementation, to bring a node into
> quarantine state requires *two" times of node communication error
> detection. Since check_backend_down_request() only allows to raise
> node down even *once" (after the down state is changed to already_down
> state), it's impossible to test the watchdog quarantine using
> check_backend_down_request(). I changed check_backend_down_request()
> so that it continues to raise "down" event as long as the down request
> file exists.
>
> Attached patch tries to enhance check_backend_down_request() as
> described above.
>
> 1) caller of check_backend_down_request() is
> establish_persistent_connection(), rather than
> do_health_check_child().
>
> 2) check_backend_down_request() does not change "down" to
> "already_down" anymore. This means that the second argument of
> check_backend_down_request() is not useful anymore. Probably I
> should remove the argument later on.
>
> If there's no objection, I will commit/push this by the end of this
> weekend.
More information about the pgpool-hackers
mailing list