[pgpool-general: 4388] please help me! problem with online recovery

Wed Feb 3 20:16:12 JST 2016

Hi!

I 've configured two server in HA so as reported by this tutorial.

http://www.pgpool.net/pgpool-web/contrib_docs/watchdog_master_slave_3.3/en.h
tml

I've installed Centos 7,  pgpool-II version 3.4.3 and postgres (PostgreSQL)
9.4

I've modified the script for adapt to my configuration and I put them in
postgresql data directory with owner postgres and execution privilege  but
nothing works

When my postgres primary server goes down (poweroff) failover commands start
and other server became primary (in the postgres directory recovery.conf
become recovery.done and I can see trigger file that my failover script put)

My failover script:

#!/bin/bash -x

FALLING_NODE=$1         # %d

OLDPRIMARY_NODE=$2      # %P

NEW_PRIMARY=$3          # %H

PGDATA=$4               # %R

if [ $FALLING_NODE = $OLDPRIMARY_NODE ]; then

    if [ $UID -eq 0 ]

    then

        su postgres -c "ssh -T postgres@$NEW_PRIMARY touch $PGDATA/trigger"

    else

        ssh -T postgres@$NEW_PRIMARY touch $PGDATA/trigger

    fi

    exit 0;

fi;

exit 0;

when I start old primary  server that failed pgpool not start procedure of
online recovery 

my recovery_1st_stage file

#!/bin/bash -x

PGDATA=$1

REMOTE_HOST=$2

REMOTE_PGDATA=$3

echo "I have called online recovery " >>
/data/postgreStorage/logifonlinereccaptured.log

PORT=5432

PGHOME=/usr/pgsql-9.4

ARCH=/data/postgreStorage/arch

rm  -rf $ARCH/*

ssh -T postgres@$REMOTE_HOST "

LD_LIBRARY_PATH=$PGHOME/lib:LD_LIBRARH_PATH;

rm -rf $REMOTE_PGDATA

echo "mypassword" | $PGHOME/bin/pg_basebackup -h $HOSTNAME -U postgres
--password -D $REMOTE_PGDATA -x -c fast

rm $REMOTE_PGDATA/trigger"

ssh -T postgres@$REMOTE_HOST "rm -rf $ARCH/*"

ssh -T postgres@$REMOTE_HOST "mkdir -p
$REMOTE_PGDATA/pg_xlog/archive_status"

ssh -T postgres@$REMOTE_HOST "

cd $REMOTE_PGDATA;

cp postgresql.conf postgresql.conf.bak;

sed -e 's/#*hot_standby = off/hot_standby = on/' postgresql.conf.bak >
postgresql.conf;

rm -f postgresql.conf.bak;

cat > recovery.conf << EOT

standby_mode = 'on'

primary_conninfo = 'host="$HOSTNAME" port=$PORT user=replica
password=mypassword'

restore_command = 'scp $HOSTNAME:$ARCH/%f %p'

trigger_file = '$PGDATA/trigger'

EOT

"

In directory of postgres data I put pgpool_remote_start with owner postgres
and execution permission

#!/bin/sh

REMOTE_HOST=$1

REMOTE_PGDATA=$2

PGHOME=/usr/pgsql-9.4

ssh -T postgres@$REMOTE_HOST "

LD_LIBRARY_PATH=$PGHOME/lib:LD_LIBRARH_PATH;

$PGHOME/bin/pg_ctl -w -D $REMOTE_PGDATA start 2>/dev/null 1>/dev/null <
/dev/null &"

I've configured pgpool and postgres to use md5 auth but it seems that this
is not a problem because postgresql log and pgpool log don't show errors

When old primary start if do command pcp_node_info 5 server1 9898 user
password 0  , it return server1 5432 2 0.500000

Command pcp_node_info 5 server1 9898 user password 1  return server2 5432 2
0.500000

Command pcp_node_info 5 server1 9898 user password 1  return server2 5432 2
0.500000

Command pcp_node_info 5 server2 9898 user password 0 return  server1 5432 3
0.500000

Command pcp_node_info 5 server2 9898 user password 1 return  server1 5432 2
0.500000

If I check log of postgresql of server1 I seen that it started as master..

My pgpool configuration file

Server1 and server2  replace my servers' hostname

Server1 default primary (that I start for first) only parameter modified
from default

listen_addresses = '*'  

listen_backlog_multipler = 5

backend_hostname0 = 'server1'

backend_port0 = 5432

backend_weight0 = 1

backend_data_directory0 = '/data/postgreStorage/'

backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'server2'

backend_port1 = 5432

backend_weight1 = 1

backend_data_directory1 = '/data/postgreStorage/'

backend_flag1 = 'ALLOW_TO_FAILOVER'

enable_pool_hba = on

pool_passwd = 'pool_passwd'

log_connections = on

log_min_messages = debug1

load_balance_mode = on

master_slave_mode = on

sr_check_user = 'postgres'

sr_check_password = 'mypostgrespassword'

follow_master_command = ''

health_check_period = 5

health_check_timeout = 20

health_check_user = 'postgres'

health_check_password = 'my postgres password'

failover_command = '/data/postgreStorage/failover.sh %d %P %H %R'

failback_command = ''

recovery_user = 'postgres'

recovery_password = 'mypostgrespassword'

recovery_1st_stage_command = 'recovery_1st_stage'

recovery_2nd_stage_command = ''

recovery_timeout = 90

client_idle_limit_in_recovery = 0

use_watchdog = on

ping_path = '/usr/bin/'

wd_hostname = 'server1'

wd_port = 9000

wd_authkey = 'password' (same on the two server pgpool config file)

delegate_IP = ''

ifconfig_path = '/usr/sbin/'

wd_lifecheck_method = 'heartbeat'

wd_interval = 10

wd_heartbeat_port = 9694

wd_heartbeat_keepalive = 2

wd_heartbeat_deadtime = 30

heartbeat_destination0 = 'server2'

heartbeat_destination_port0 = 9694

other_pgpool_hostname0 = 'server2'

other_pgpool_port0 = 9999

other_wd_port0 = 9000

Server2 default secondary (that I start for second and pith postgresql
slave) only parameter modified from default

listen_addresses = '*'

port = 9999

pcp_listen_addresses = '*'

pcp_port = 9898

listen_backlog_multiplier = 5

backend_hostname0 = 'server1'

backend_port0 = 5432

backend_weight0 = 1

backend_data_directory0 = '/data/postgreStorage/'

backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'server2'

backend_port1 = 5432

backend_weight1 = 1

backend_data_directory1 = '/data/postgreStorage/'

backend_flag1 = 'ALLOW_TO_FAILOVER'

enable_pool_hba = on

pool_passwd = 'pool_passwd'

authentication_timeout = 60

log_connections = on

log_min_messages = debug1

load_balance_mode = on

master_slave_mode = on

master_slave_sub_mode = 'stream'

sr_check_period = 10

sr_check_user = 'postgres'

sr_check_password = 'mypass'

follow_master_command = ''

health_check_period = 5

health_check_timeout = 20

health_check_user = 'postgres'

health_check_password = 'mypass'

health_check_max_retries = 5

health_check_retry_delay = 2

connect_timeout = 10000

failover_command = '/data/postgreStorage/failover.sh %d %P %H %R'

failback_command = ''

fail_over_on_backend_error = on

search_primary_node_timeout = 10

recovery_user = 'postgres'

recovery_password = 'mypass'

recovery_1st_stage_command = 'recovery_1st_stage'

recovery_2nd_stage_command = ''

recovery_timeout = 90

client_idle_limit_in_recovery = 0

use_watchdog = on

ping_path = '/usr/bin/'

wd_hostname = 'server2'

wd_port = 9000

wd_authkey = 'password' (same on the two server pgpool config file)

delegate_IP = ''

ifconfig_path = '/usr/sbin/'

arping_path = '/usr/sbin'

wd_lifecheck_method = 'heartbeat'

wd_interval = 10

wd_heartbeat_port = 9694

wd_heartbeat_keepalive = 2

wd_heartbeat_deadtime = 30

heartbeat_destination0 = 'server1'

heartbeat_destination_port0 = 9694

other_pgpool_hostname0 = 'server1'

other_pgpool_port0 = 9999

other_wd_port0 = 9000

Can I help me?

I'm desperate!!!

Thanksssss

A.Baccanelli

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20160203/0bd02023/attachment.htm>