watchdog feature enhancement
What's driving this enhancement
- Watchdog is a very important feature of pgpool-II as it is used to eliminate the single point of failure and provide HA. But there are few feature requests and bugs in the existing watchdog that require little more than a simple code fix, and requires the complete revisit of its core architecture. So this enhancement of watchdog is aimed at providing the stability and robustness to the existing pgpool-II watchdog with some new cool features.
- -- [pgpool-general: 3724] delegate ip lost
- -- [pgpool-II 0000135]: Delegate IP does not get up on Standby upon Active gets disconnected (same in ppgool-general: 3736)
- -- Split-brain scenario due to network partitioning
- -- [ppgool-general: 3595] Watchdog issue.
- -- [pgpool-general: 3443] watchdog on cloud
- -- [pgpool-general: 3126] watchdog voting
- -- [pgpool-general: 2985] Re: Connections stuck in CLOSE_WAIT, again
- -- [pgpool-general: 2949] Re: pgpool 3.3.3 watchdog problem
- -- [pgpool-general: 2797] pcp_watchdog_info parameters
- -- [pgpool-general: 2768] timeout Watchdog
- -- [pgpool-general: 2427] watchdog quorum
- -- [pgpool-general: 2418] Re: watchdog: different statuses on different pgpool nodes.
- -- [pgpool-general: 3772] Race condition for VIP assignment
- -- Lots of question on suid or root privileges are required ([pgpool-general: 3323] Re: Watchdog - ifconfig up failed)
- -- User wants ACTIVE-ACTIVE pgpool-II configuration and miscellaneous comments on the difficulty in configuration of watchdog
- Summary
- Analyzing above pgpool-II community threads related to watchdog, It comes down to four main areas where current pgpool-II watchdog requires some enhancements.
- 1-- Related to Virtual IP assignments and handling the case of lost of VIP
- 2-- Split-brain scenario, recovery from it and watchdog quorum
- 3-- Users run into misconfigured watchdog situations very often.
- 4-- Wants watchdog on cloud and active-active watchdog configurations
- Still open Issues
- -- [pgpool-general: 3724] delegate ip lost
- -- [pgpool-general: 3772] Race condition for VIP assignment
- -- [pgpool-general: 3228] Split brain or using 3 nodes ?
- -- [pgpool-general: 3728] Re: pgpool-general Digest, Vol 43, Issue 17
What is required by the watchdog?
The main purpose of the watchdog in pgpool-II is to provide high availability, For this purpose the watchdog is required to ensure following.
-- Ensure only healthy nodes are part of the cluster -- Ensure only authorized nodes can become the member of the cluster -- Ensure only one pgpool-II node is a designated master node at any time -- Provide the automatic recovery mechanism when possible when some problem occurs