pgpool-II Tutorial [watchdog in master-slave mode]

About this document

In this tutorial, I explain the way to share two PostgreSQLs between two pgpool-IIs using "watchdog" functionality.

You need two Linux servers on which both PostgreSQL (9.1 or later) and pgpool-II (3.2 or later) are installed. Let these servers be "osspc19" and "osspc20". A server for executing psql etc. is notated "someserver" for convenience, and this can be either osspc19, osspc20, or another one.

configuration

PostgreSQL configuration

Install PostgreSQL to both servers, and set up Hot Standby/Streaming Replication. I assume that the initial primary server is osspc19 and that the standby server is osspc20.

See PostgreSQL documents for setting up Hot Standby/Streaming Replication.

pgpool-II configuration

master-slave mode

Set up master-slave mode using streaming replication as replication way.

master_slave_mode = on
                                   # Activate master/slave mode
                                   # (change requires restart)

master_slave_sub_mode = 'stream'
                                   # Master/slave sub mode
                                   # Valid values are combinations slony or
                                   # stream. Default is slony.
                                   # (change requires restart)

To complete remaining setup of master-slave mode, configure pgpool.conf (backends information, failover_command, and recovery_1st_stage etc.), and install script-files (failover.sh, recovery_1st_stage, pgpool_remote start etc.) properly.

Samples of a configuration file and scripts used in this tutorial are here.

Because we use pg_basebackup in this example first stage script, we assume PostgreSQL 9.1 or later is used. However, using pg_basebackup is not an essential requirement for watchdog, and you can use PostgreSQL 9.0 or before if you use rsync etc. instead of pg_basebackup.

pgpool-II basic configuration

I explain other basic configurations for pgpool-II in this section. These are in common in osspc19 and osspc20.

pgpool-II port

Specify the port number for pgpool-II. Leave the default value here.

port = 9999
                                   # Port number
                                   # (change requires restart)

Load blance

Enable load blance mode.

load_balance_mode = on
                                   # Activate load balancing mode
                                   # (change requires restar

Health check

Enable health check for enabling automatic failover.

health_check_period = 5
                                   # Health check period
                                   # Disabled (0) by default
health_check_timeout = 0
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user

watchdog configuration

Enabling watchdog

Enable watchdog functionality.

use_watchdog = on
                                    # Activates watchdog

Virtual IP

Specify virtual IP address that accepts connections from clients. Set a IP address that isn't used yet.

delegate_IP = '133.137.177.222'
                                    # delegate IP address

watchdog hostname

Specify the hostname of the each server.

(osspc19)
wd_hostname = 'osspc19'
                                    # Host name or IP address of this watchdog
(osspc20)
wd_hostname = 'osspc20'
                                    # Host name or IP address of this watchdog

watchdog port

Specify the port number for watchdog. Leave the default value here.

wd_port = 9000
                                    # port number for watchdog service

Paths for commands to control virtual IP

Specify the paths for commands to control virtual IP. In this tutorial, set /home/apache/sbin. The reason will be described later.


ifconfig_path = '/home/apache/sbin'
                                    # ifconfig command path

arping_path = '/home/apache/sbin' 
                                    # arping command path

Lifechek intereval

Specify the interval of lifechek. Set 3 seconds that is shorter than default.

wd_interval = 3
                                    # lifecheck interval (sec) > 0

pgpool-II to be monitored

For monitoring the other pgpool-II, specify the hostname, the pgpool-II port number, and the watchdog port number in the each server.

(osspc19)
other_pgpool_hostname0 = 'osspc20'
                                    # Host name or IP address to connect to for other pgpool 0
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0
(osspc20)
other_pgpool_hostname0 = 'osspc19'
                                    # Host name or IP address to connect to for other pgpool 0
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0

Install of pgpoolAdmin

In this tutorial, we use pgpoolAdmin for easier management. Install pgpoolAdmin to osspc19 and osspc20 respectively. Installation details will not be shown in this document. See appropriate manual.

setuid configuration

In watchdog process, root privilege is required to contol virtual IP. You could start pgpool-II as root user. However in this tutorial, Apache needs to start pgpool as apache user and control virtual IP because we are using pgpoolAdmin. For this purpose, we setuid if_config and arping. Also we don't want any user other than apache accesses the commands because of security reason. Execute following commands on each of osspc19 and osspc20 (It requires root privilege).

At first, make a directory for containing ipconfig and arping which is set setuid. The path is specified at ifconif_path and arping_path; in this tutorial, this is /home/apache/sbin. Then give execute privilege to only apache user.

$ su -
# mkdir -p /home/apache/sbin
# chown apache:apache /home/apache/sbin
# chmod 700 /home/apache/sbin

Next, copy the original ifconfig and arping to the directory and then set setuid to these.

# cp /sbin/ifconfig /home/apache/sbin
# cp /user/sbin/arping /home/apache/sbin
# chmod 4755 /home/apache/sbin/ifconfig
# chmod 4755 /home/apache/sbin/arping 

Note that explained above should be used for tutorial purpose only. In the real world you'd better create setuid wrapper programs to execute ifconfig and arping. This is left for your exercise.

Starting pgpool-II

We assume PostgreSQL is already running in both servers.

Starting each pgpool-II

Starting pgpool-II from pgpoolAdmin

Before starting pgpool-II, set up pgpoolAdmin for log browsing. Check on "Don't run in daemon mode(-n)" in the "pgpoolAdmin Setting" view and then click "Update" button.

Then, start pgpool-II from "pgpool Status" view of each pgpoolAdmin in order of osspc19, osspc20. Confirm that "Don'T run in daemon mode (-n)" is on and then click "Satrt pgpool".

Browsing log

Click "Log" button which appears after pgpool-II starts, and you can browse log messages.

Starting life check

After a while, when life check has started and log outputs following message, watchdog is ready now. In our setting, it takes 30 seconds or more. (This depends on setting of wd_intreval.)

LOG:    pid 19790:      wd_init: start watchdog
...
LOG:    pid 19793:      watchdog: lifecheck started

Confirming virtual IP

Confirming connection via virtual IP

Confirm that you can connect to PostgreSQL via virtual IP.

(someserver) $ psql -h 133.137.177.222 -p 9999 -l

Confirming active pgpool-II

Confirm that virtual IP is brought up on osspc19 which has stated furst, which means the pgpool-II is running as active.

(osspc19) $ /sbin/ifconfig | grep eth0:0 -1

eth0:0    Link encap:Ethernet  HWaddr 00:1E:C9:xx:xx:xx
	  inet addr:133.137.177.222  Bcast:133.137.177.255  Mask:255.255.254.0

confirming standby pgpool-II

On the other hand, the virtual IP isn't brought up on osspc20 and the pgpool-II is running as standby.

(osspc20) $ /sbin/ifconfig | grep eth0:0 -1
  -- NO OUTPUT --

Switching active/standby

Stopping active pgpool-II

After active pgpool-II has stopped, previous standby pgpool-II take over the virtual IP as new active.

Stop pgpool-II from pgpoolAdmin by clicking "Stop pgpool" button on osspc19.

Confirm that virtual IP is brought up on osspc20.

(osspc20) $ /sbin/ifconfig | grep eth0:0 -1

eth0:0    Link encap:Ethernet  HWaddr 00:1E:C9:xx:xx:xx
	  inet addr:133.137.177.222  Bcast:133.137.177.255  Mask:255.255.254.0

You can connect to PostgreSQL via the same virtual IP.

(someserver) $ psql -h 133.137.177.222 -p 9999 -l

Restarting pgpool-II

Start pgpool-II from pgpoolAdmin on osspc19.

Confirm that the pgpool is running as standby and that virtual IP is not brought up on osspc19.

(osspc19) $ sudo bin/pgpool -n > pgpool.log 2>&1
  -- NO OUTPUT --

As you see, the behaviors of watchdog explained so far is exactly the same as single backend case.

Nodes information synchronization

In the case that multiple PostgreSQL are sharing, watchdog synchronizes the DB node statuses information between pgpool-II. When failover command (attaching, detaching, or promoting node) is executed in any pgpool-II, the same command is executed in all other pgpool-II.

Confirming node statuses

Check present node statuses from both pgpoolAdmin and confirm that both nodes are up.

Stopping primary DB

Stop PostgreSQL on osspc19 which is present primary DB.

(osspc19) $ pg_ctl stop -m i

Then failover is executed and PostgreSQL on osspc20 becomes new primary DB.

Log of each pgpool-II outputs following messages.

LOG:   pid 14787: failover: set new primary node: 1
LOG:   pid 14787: failover: set new master node: 1
...
LOG:   pid 14787: failover done. shutdown host osspc19(5432)

Confirm that osspc19 node is down from both pgpoolAdmin.

Online recovery of DB

Click "Recovery" button, and make online recovery of osspc19.

When online recovery finished, log of each pgpool-II outputs following message, that means osspc19 is recovered and reconnected now.

2012-10-05 16:28:38 LOG:   pid 16774: failover: set new primary node: 1
2012-10-05 16:28:38 LOG:   pid 16774: failover: set new master node: 0
2012-10-05 16:28:38 LOG:   pid 16774: failback done. reconnect host osspc19(5432)

You can confirm that osspc19 is running as standby DB now from both pgpoolAdmin.

Compared from the initial state, primary DB and secondary DB has been switched now. This means destination of INSERT query sent to pgpool-II has changed. However the information of node statuses is shared between all pgpool-II. Therefore, even if active/standby of pgpool-II is switched, pgpool-II should access backend correctly.

Restart pgpool-II on osspc20 which is present active from pgpoolAdmin. Then pgpool-II on osspc20 becomes new active.

Confirm that you can read and write data successfully via virtual IP.

(someserver) $ psql -h 133.137.177.222 -p 9999 -c "INSERT INTO test VALUES (123)"
INSERT 0 1

(someserver) $ psql -h 133.137.177.222 -p 9999 -c "SELECT * FROM test"
 i 
----
 123
(1 row)

Shutting down server

Finally, let's confirm what occurs when a server machine goes down. Restart pgpool-II on osspc19 again. Then pgpool-II on osspc20 should be active and PostgreSQL on osspc20 should be primary DB now.

(osspc20) $ /sbin/ifconfig | grep eth0:0 -1

eth0:0    Link encap:Ethernet  HWaddr 00:1E:C9:xx:xx:xx
	  inet addr:133.137.177.222  Bcast:133.137.177.255  Mask:255.255.254.0

Shut down osspc20 server.

(osspc20) $ su -
(osspc20) # shutdown now 

After a while, pgpoo-II on osspc19 detects down of osspc20, and then PostgreSQL on osspc19 promotes to primary DB and pgpol-II on osspc19 promotes to active.

(osspc19) $ /sbin/ifconfig | grep eth0:0 -1

eth0:0    Link encap:Ethernet  HWaddr 00:1E:C9:xx:xx:xx
	  inet addr:133.137.177.222  Bcast:133.137.177.255  Mask:255.255.254.0