watchdog feature enhancement

From pgpool Wiki
Jump to: navigation, search

What's driving this enhancement

Watchdog is a very important feature of pgpool-II as it is used to eliminate the single point of failure and provide HA. But there are few feature requests and bugs in the existing watchdog that require little more than a simple code fix, and requires the complete revisit of its core architecture. So this enhancement of watchdog is aimed at providing the stability and robustness to the existing pgpool-II watchdog with some new cool features.

Some pgpool-II general mailing list threads related to watchdog

  • [pgpool-general: 3724] delegate ip lost
  • [pgpool-II 0000135]: Delegate IP does not get up on Standby upon Active gets disconnected (same in ppgool-general: 3736)
  • Split-brain scenario due to network partitioning
  • [ppgool-general: 3595] Watchdog issue.
  • [pgpool-general: 3443] watchdog on cloud
  • [pgpool-general: 3126] watchdog voting
  • [pgpool-general: 2985] Re: Connections stuck in CLOSE_WAIT, again
  • [pgpool-general: 2949] Re: pgpool 3.3.3 watchdog problem
  • [pgpool-general: 2797] pcp_watchdog_info parameters
  • [pgpool-general: 2768] timeout Watchdog
  • [pgpool-general: 2427] watchdog quorum
  • [pgpool-general: 2418] Re: watchdog: different statuses on different pgpool nodes.
  • [pgpool-general: 3772] Race condition for VIP assignment
  • Lots of question on suid or root privileges are required ([pgpool-general: 3323] Re: Watchdog - ifconfig up failed)
  • User wants ACTIVE-ACTIVE pgpool-II configuration and miscellaneous comments on the difficulty in configuration of watchdog

Summary Analyzing above pgpool-II community threads related to watchdog, It comes down to four main areas where current pgpool-II watchdog requires some enhancements.

  • 1-- Related to Virtual IP assignments and handling the case of lost of VIP
  • 2-- Split-brain scenario, recovery from it and watchdog quorum
  • 3-- Users run into misconfigured watchdog situations very often.
  • 4-- Wants watchdog on cloud and active-active watchdog configurations

Still open Issues

  • [pgpool-general: 3724] delegate ip lost
  • [pgpool-general: 3772] Race condition for VIP assignment
  • [pgpool-general: 3228] Split brain or using 3 nodes ?
  • [pgpool-general: 3728] Re: pgpool-general Digest, Vol 43, Issue 17

Design proposal for enhanced watchdog

Terminologies used below

Cluster: Cluster is the logical entity which contains all the pgpool-II server nodes connected by pgpool-II watchdog.

What is required by the watchdog?

To provide the HA watchdog is required to ensure following.

  • Ensure only healthy nodes are part of the cluster
  • Ensure only authorized nodes can become the member of the cluster
  • Ensure only one pgpool-II node is a designated master node at any time
  • Provide the automatic recovery mechanism when possible when some problem occurs

Watchdog need to provide a guard against following types of failures

  • pgpool-II service failure
  • complete or partial network failures.

High level responsibilities of the watchdog

  • Health checking of all participating pgpool-II nodes in the cluster including the health checking of local pgpool-II server.
  • Ensure the availability of delegate-ip always on a single node at all the time.
  • Mechanism to add and remove pgpool-II nodes from the cluster.
  • Perform the leader election to select the master node when the cluster is initialized or in case of master node failure.
  • Performs an automatic recovery if the due to some issue the cluster state is broken or split-brain scenario happens
  • Generate alarms for failures where administrator intervention is required to rectify the problem.
  • Manage the pgpool-II configurations to make sure all the nodes in the cluster have similar configurations.
  • Provide the effective way of health checking of other nodes (heartbeat) and messaging between participating nodes.
  • Ensure security so that only intended nodes can become the cluster members.
  • Provide the mechanism so that administrator can check the status of the cluster and alarms generated by cluster.
  • Able to remove the node membership from the cluster(node fencing) if a problematic node is detected or requested by administrator command.

Watchdog on Amazon Cloud and other cloud flavours

This is the much asked for feature that pgpool-II watchdog should work seamlessly on AWS. So the enhanced watchdog will work on amazon cloud where a simple virtual IP can not be used by pgpool-II watchdog. For this the enhanced watchdog will implement two new features.

  • Active-Active watchdog configuration: This will be a big improvement to the pgpool-II watchdog and this would effectually mean that multiple pgpool-II servers can be installed and external load-balancer and HA system can be used with the pgpool-II
  • New watchdog will be flexible enough to allow utilities other than ifconfig (e.g ec2-assign-private-ip-addresses for AWS virtual IP) can be used to bring up virtual-IP

Logical Components of watchdog

The pgpool-II watchdog system will consists of following discrete logical components

  • Heartbeat to monitor health and availability of cluster member nodes.
  • Messaging system, to share status and configurations between cluster member nodes.
    • All the messaging will be in xml or text based extensible protocol to ensure easy debugging and future extensions
    • Will provide a communication mechanism for unicast as well as broadcast messaging
  • Local resource manager, which will have a responsibility to monitor the health of local resources. It will consist of two sub components
    • delegate-IP monitoring and control
    • Local pgpool-II server monitoring
  • Information database, That will store and manage all the cluster wide runtime information and pgpool-II configurations
  • IPC listener to enable administrator control by PCP commands.

Working overview of watchdog system

The new watchdog system will be a finite state machine which will transit between different states. Some prominent systems states will be

  • IDLE -- nothing is happening
  • STARTING -- starting up
  • STOPING -- stoping
  • ELECTION -- Take part in the election
  • JOINING CLUSTER -- we are initialised and joining the cluster
  • ELECTED MASTER -- If the node has been just elected as the master node
  • NORMAL NODE -- If we are not master and have joined the cluster as a slave node
  • RECOVERY -- some event occurred and we are recovering from it

The basic working of the watchdog will be as follows

  • At startup do basic sanity checks and go into the normal member node state, wait for the instructions from the master node or start the election algorithm.
  • If the election algorithm is started, Participate in the elections and become either master node or normal node, depending on election results.
  • Once the election is complete, if we are the master node, move to the master waiting state and construct the complete view of member nodes and cluster state
  • Construct the information database and propagate it to all member nodes.
  • Start the health-checking of local resources and remote nodes and stay in this state until some failure occur. Depending upon type of failure or event takes appropriate actions. The action could be one of the following
    • Kill itself.
    • Start leader election
    • Restart a local resource (pgpool-II server or delegate-IP)
    • Inform about some event or failure to master node (if it is not master node)
    • Replicate the configuration or information to the member nodes (master node only)
    • Perform fencing of member node (master node only)

Responsibilities of master watchdog node

  • Maintaining the up to date configurations of pgpool-II and replicating it to all participating nodes in the cluster
  • Health checking of backend pgpool-II nodes, And if the configuration is in such a way that all members are required to do backend health checking, or if the backend error is detected by any other member of cluster, then ensure that failover of the backend node is executed only by a single node.
  • Managing the fencing, joining and leaving of members from the cluster
  • Keeping hold of delegate-IP and making sure that it is recovered back if for some reason it is dropped.
  • Handing over the responsibility to some other cluster member if for some issue, it is not able to continue as master node or instruct by administrator command.

Leader election algorithm

Selecting the best algorithm for selecting the master pgpool-II node in case of master node failure or at start-up is still a TODO, and one of the suggestion is to use Leader Election in Asynchronous Distributed Systems http://www.cs.indiana.edu/pub/techreports/TR521.pdf algorithm (Also used by pacemaker).

Other leader selection algorithm suggestions are most welcome