In this tutorial, I explain the way to share two PostgreSQLs between two pgpool-IIs using "watchdog" functionality.
You need two Linux servers on which both PostgreSQL (9.1 or later) and pgpool-II (3.2 or later) are installed. Let these servers be "osspc19" and "osspc20". A server for executing psql etc. is notated "someserver" for convenience, and this can be either osspc19, osspc20, or another one.
Install PostgreSQL to both servers, and set up Hot Standby/Streaming Replication. I assume that the initial primary server is osspc19 and that the standby server is osspc20.
See PostgreSQL documents for setting up Hot Standby/Streaming Replication.
Set up master-slave mode using streaming replication as replication way.
master_slave_mode = on # Activate master/slave mode # (change requires restart) master_slave_sub_mode = 'stream' # Master/slave sub mode # Valid values are combinations slony or # stream. Default is slony. # (change requires restart)
To complete remaining setup of master-slave mode, configure pgpool.conf (backends information, failover_command, and recovery_1st_stage etc.), and install script-files (failover.sh, recovery_1st_stage, pgpool_remote start etc.) properly.
Samples of a configuration file and scripts used in this tutorial are here.
Because we use pg_basebackup in this example first stage script, we assume PostgreSQL 9.1 or later is used. However, using pg_basebackup is not an essential requirement for watchdog, and you can use PostgreSQL 9.0 or before if you use rsync etc. instead of pg_basebackup.
I explain other basic configurations for pgpool-II in this section. These are in common in osspc19 and osspc20.
Specify the port number for pgpool-II. Leave the default value here.
port = 9999 # Port number # (change requires restart)
Enable load blance mode.
load_balance_mode = on # Activate load balancing mode # (change requires restar
Enable health check for enabling automatic failover.
health_check_period = 5 # Health check period # Disabled (0) by default health_check_timeout = 0 # Health check timeout # 0 means no timeout health_check_user = 'postgres' # Health check user
Enable watchdog functionality.
use_watchdog = on # Activates watchdog
Specify virtual IP address that accepts connections from clients. Set a IP address that isn't used yet.
delegate_IP = '133.137.177.222' # delegate IP address
Specify the hostname of the each server.
(osspc19) wd_hostname = 'osspc19' # Host name or IP address of this watchdog
(osspc20) wd_hostname = 'osspc20' # Host name or IP address of this watchdog
Specify the port number for watchdog. Leave the default value here.
wd_port = 9000 # port number for watchdog service
Specify the paths for commands to control virtual IP. In this tutorial, set /home/apache/sbin. The reason will be described later.
ifconfig_path = '/home/apache/sbin' # ifconfig command path arping_path = '/home/apache/sbin' # arping command path
Specify the interval of lifechek. Set 3 seconds that is shorter than default.
wd_interval = 3 # lifecheck interval (sec) > 0
For monitoring the other pgpool-II, specify the hostname, the pgpool-II port number, and the watchdog port number in the each server.
(osspc19) other_pgpool_hostname0 = 'osspc20' # Host name or IP address to connect to for other pgpool 0 other_pgpool_port0 = 9999 # Port number for othet pgpool 0 other_wd_port0 = 9000 # Port number for othet watchdog 0
(osspc20) other_pgpool_hostname0 = 'osspc19' # Host name or IP address to connect to for other pgpool 0 other_pgpool_port0 = 9999 # Port number for othet pgpool 0 other_wd_port0 = 9000 # Port number for othet watchdog 0
In this tutorial, we use pgpoolAdmin for easier management. Install pgpoolAdmin to osspc19 and osspc20 respectively. Installation details will not be shown in this document. See appropriate manual.
In watchdog process, root privilege is required to contol virtual IP. You could start pgpool-II as root user. However in this tutorial, Apache needs to start pgpool as apache user and control virtual IP because we are using pgpoolAdmin. For this purpose, we setuid if_config and arping. Also we don't want any user other than apache accesses the commands because of security reason. Execute following commands on each of osspc19 and osspc20 (It requires root privilege).
At first, make a directory for containing ipconfig and arping which is set setuid. The path is specified at ifconif_path and arping_path; in this tutorial, this is /home/apache/sbin. Then give execute privilege to only apache user.
$ su - # mkdir -p /home/apache/sbin # chown apache:apache /home/apache/sbin # chmod 700 /home/apache/sbin
Next, copy the original ifconfig and arping to the directory and then set setuid to these.
# cp /sbin/ifconfig /home/apache/sbin # cp /user/sbin/arping /home/apache/sbin # chmod 4755 /home/apache/sbin/ifconfig # chmod 4755 /home/apache/sbin/arping
Note that explained above should be used for tutorial purpose only. In the real world you'd better create setuid wrapper programs to execute ifconfig and arping. This is left for your exercise.
We assume PostgreSQL is already running in both servers.
Before starting pgpool-II, set up pgpoolAdmin for log browsing. Check on "Don't run in daemon mode(-n)" in the "pgpoolAdmin Setting" view and then click "Update" button.
Then, start pgpool-II from "pgpool Status" view of each pgpoolAdmin in order of osspc19, osspc20. Confirm that "Don'T run in daemon mode (-n)" is on and then click "Satrt pgpool".
Click "Log" button which appears after pgpool-II starts, and you can browse log messages.
After a while, when life check has started and log outputs following message, watchdog is ready now. In our setting, it takes 30 seconds or more. (This depends on setting of wd_intreval.)
LOG: pid 19790: wd_init: start watchdog ... LOG: pid 19793: watchdog: lifecheck started
Confirm that you can connect to PostgreSQL via virtual IP.
(someserver) $ psql -h 133.137.177.222 -p 9999 -l
Confirm that virtual IP is brought up on osspc19 which has stated furst, which means the pgpool-II is running as active.
(osspc19) $ /sbin/ifconfig | grep eth0:0 -1 eth0:0 Link encap:Ethernet HWaddr 00:1E:C9:xx:xx:xx inet addr:133.137.177.222 Bcast:133.137.177.255 Mask:255.255.254.0
On the other hand, the virtual IP isn't brought up on osspc20 and the pgpool-II is running as standby.
(osspc20) $ /sbin/ifconfig | grep eth0:0 -1 -- NO OUTPUT --
After active pgpool-II has stopped, previous standby pgpool-II take over the virtual IP as new active.
Stop pgpool-II from pgpoolAdmin by clicking "Stop pgpool" button on osspc19.
Confirm that virtual IP is brought up on osspc20.
(osspc20) $ /sbin/ifconfig | grep eth0:0 -1 eth0:0 Link encap:Ethernet HWaddr 00:1E:C9:xx:xx:xx inet addr:133.137.177.222 Bcast:133.137.177.255 Mask:255.255.254.0
You can connect to PostgreSQL via the same virtual IP.
(someserver) $ psql -h 133.137.177.222 -p 9999 -l
Start pgpool-II from pgpoolAdmin on osspc19.
Confirm that the pgpool is running as standby and that virtual IP is not brought up on osspc19.
(osspc19) $ sudo bin/pgpool -n > pgpool.log 2>&1 -- NO OUTPUT --
As you see, the behaviors of watchdog explained so far is exactly the same as single backend case.
In the case that multiple PostgreSQL are sharing, watchdog synchronizes the DB node statuses information between pgpool-II. When failover command (attaching, detaching, or promoting node) is executed in any pgpool-II, the same command is executed in all other pgpool-II.
Check present node statuses from both pgpoolAdmin and confirm that both nodes are up.
Stop PostgreSQL on osspc19 which is present primary DB.
(osspc19) $ pg_ctl stop -m i
Then failover is executed and PostgreSQL on osspc20 becomes new primary DB.
Log of each pgpool-II outputs following messages.
LOG: pid 14787: failover: set new primary node: 1 LOG: pid 14787: failover: set new master node: 1 ... LOG: pid 14787: failover done. shutdown host osspc19(5432)
Confirm that osspc19 node is down from both pgpoolAdmin.
Click "Recovery" button, and make online recovery of osspc19.
When online recovery finished, log of each pgpool-II outputs following message, that means osspc19 is recovered and reconnected now.
2012-10-05 16:28:38 LOG: pid 16774: failover: set new primary node: 1 2012-10-05 16:28:38 LOG: pid 16774: failover: set new master node: 0 2012-10-05 16:28:38 LOG: pid 16774: failback done. reconnect host osspc19(5432)
You can confirm that osspc19 is running as standby DB now from both pgpoolAdmin.
Compared from the initial state, primary DB and secondary DB has been switched now. This means destination of INSERT query sent to pgpool-II has changed. However the information of node statuses is shared between all pgpool-II. Therefore, even if active/standby of pgpool-II is switched, pgpool-II should access backend correctly.
Restart pgpool-II on osspc20 which is present active from pgpoolAdmin. Then pgpool-II on osspc20 becomes new active.
Confirm that you can read and write data successfully via virtual IP.
(someserver) $ psql -h 133.137.177.222 -p 9999 -c "INSERT INTO test VALUES (123)" INSERT 0 1 (someserver) $ psql -h 133.137.177.222 -p 9999 -c "SELECT * FROM test" i ---- 123 (1 row)
Finally, let's confirm what occurs when a server machine goes down. Restart pgpool-II on osspc19 again. Then pgpool-II on osspc20 should be active and PostgreSQL on osspc20 should be primary DB now.
(osspc20) $ /sbin/ifconfig | grep eth0:0 -1 eth0:0 Link encap:Ethernet HWaddr 00:1E:C9:xx:xx:xx inet addr:133.137.177.222 Bcast:133.137.177.255 Mask:255.255.254.0
Shut down osspc20 server.
(osspc20) $ su - (osspc20) # shutdown now
After a while, pgpoo-II on osspc19 detects down of osspc20, and then PostgreSQL on osspc19 promotes to primary DB and pgpol-II on osspc19 promotes to active.
(osspc19) $ /sbin/ifconfig | grep eth0:0 -1 eth0:0 Link encap:Ethernet HWaddr 00:1E:C9:xx:xx:xx inet addr:133.137.177.222 Bcast:133.137.177.255 Mask:255.255.254.0