1.5. Testing Fail Over

Pgpool-II allows an automatic fail over when PostgreSQL server goes down. In this case Pgpool-II sets the status of the server to "down" and continue the database operation using remaining servers.

    $ pg_ctl -D data1 stop
    waiting for server to shut down.... done
    server stopped
    $ psql -p 11000 -c "show pool_nodes" test
    node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change
    ---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
    0       | /tmp     | 11002 | up     | 0.500000  | primary | 4276       | true              | 0                 | 2019-01-31 12:00:09
    1       | /tmp     | 11003 | down   | 0.500000  | standby | 1          | false             | 0                 | 2019-01-31 12:03:07
    (2 rows)
   

The standby node was shut down by pg_ctl command. Pgpool-II detects it and detaches the standby node. "show pool_nodes" command shows that the standby node is in down status. You can continue to use the cluster without the standby node:

    $ psql -p 11000 -c "SELECT sum(abalance) FROM pgbench_accounts" test
    sum   
    --------
    216117
    (1 row)
   

What happens if the primary server goes down? In this case, one of remaining standby server is promoted to new primary server. For this testing, we start from the state in which both nodes are up.

    $ psql -p 11000 -c "show pool_nodes" test
    node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change
    ---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
    0       | /tmp     | 11002 | up     | 0.500000  | primary | 0          | false             | 0                 | 2019-01-31 12:04:58
    1       | /tmp     | 11003 | up     | 0.500000  | standby | 0          | true              | 0                 | 2019-01-31 12:04:58
    (2 rows)

    $ pg_ctl -D data0 stop
    waiting for server to shut down.... done
    server stopped
    $ psql -p 11000 -c "show pool_nodes" test
    node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change
    ---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
    0       | /tmp     | 11002 | down   | 0.500000  | standby | 0          | false             | 0                 | 2019-01-31 12:05:20
    1       | /tmp     | 11003 | up     | 0.500000  | primary | 0          | true              | 0                 | 2019-01-31 12:05:20
    (2 rows)
   

Now the primary node is changed from 0 to 1. What happens inside? When the node 0 goes down, Pgpool-II detects it and executes failover_command defined in pgpool.conf. Here is the content of the file.

    #! /bin/sh
    # Execute command by failover.
    # special values:  %d = node id
    #                  %h = host name
    #                  %p = port number
    #                  %D = database cluster path
    #                  %m = new main node id
    #                  %M = old main node id
    #                  %H = new main node host name
    #                  %P = old primary node id
    #                  %R = new main database cluster path
    #                  %r = new main port number
    #                  %% = '%' character
    failed_node_id=$1
    failed_host_name=$2
    failed_port=$3
    failed_db_cluster=$4
    new_main_id=$5
    old_main_id=$6
    new_main_host_name=$7
    old_primary_node_id=$8
    new_main_port_number=$9
    new_main_db_cluster=${10}
    mydir=/home/t-ishii/tmp/Tutorial
    log=$mydir/log/failover.log
    pg_ctl=/usr/local/pgsql/bin/pg_ctl
    cluster0=$mydir/data0
    cluster1=$mydir/data1

    date >> $log
    echo "failed_node_id $failed_node_id failed_host_name $failed_host_name failed_port $failed_port failed_db_cluster $failed_db_cluster new_main_id $new_main_id old_main_id $old_main_id new_main_host_name $new_main_host_name old_primary_node_id $old_primary_node_id new_main_port_number $new_main_port_number new_main_db_cluster $new_main_db_cluster" >> $log

    if [ a"$failed_node_id" = a"$old_primary_node_id" ];then	# main failed
    ! 	new_primary_db_cluster=${mydir}/data"$new_main_id"
    echo $pg_ctl -D $new_primary_db_cluster promote >>$log	# let standby take over
    $pg_ctl -D $new_primary_db_cluster promote >>$log	# let standby take over
    sleep 2
    fi
   

The script receives necessary information as parameters from Pgpool-II. If the primary server goes down, it executes "pg_ctl -D data1 promote", which should promote the standby server to a new primary server.