[Pgpool-general] online recovery of primary is possible or not?

Tue Nov 23 09:01:47 UTC 2010

It seems pg_start_bacup and pg_stop_backup are commented out in your
script. Why?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> I have changed my basebackup.sh file a lot to get it work on 3
> different machines...
> Here it is:
> 
> #!/bin/sh -x
> # It prints on postgresql log file
> # Execution example: ./basebackup.sh /home/database/9.0.1/data
> # 192.168.0.127 /home/database/9.0.1/data
> 
> # At the beginning: 192.168.0.128 --> id 0 --> primary; 192.168.0.127
> # --> id 1 --> standby
> 
> # Postgres data files on both machines (source and destination)
> CLUSTER="/home/database/9.0.1/data"
> PSQL="/usr/lib/postgresql/9.0.1/bin/psql"
> PORT="5433"
> 
> MASTER_DB_CLUSTER=${1}
> RECOVERY_NODE_HOSTNAME=${2}
> RECOVERY_DB_CLUSTER=${3}
> FIND_RECOVERY_CONF=`find -type f -name recovery.conf`
> FIND_RECOVERY_DONE=`find -type f -name recovery.done`
> 
> # Only last part of ip address (192.168.0.127 --> 127)
> RECOVERY_NODE_HOST=`echo "${RECOVERY_NODE_HOSTNAME}" | cut -d '.' -f
> 4`
> ACTUAL_NODE=`/sbin/ifconfig eth0 | grep 'Bcast' | cut -d ':' -f 2 |
> awk '{ print $1 }' | cut -d '.' -f 4`
> 
> if [ ${ACTUAL_NODE} -eq "128" ] ; then
>     ACTUAL_NODE_ID="0"
>     RECOVERY_HOST="192.168.0.128"
> else
>     ACTUAL_NODE_ID="1"
>     RECOVERY_HOST="192.168.0.127"
> fi
> 
> # If recovery.conf or recovery.done file is found, it means that the
> # primary has been crashed
> if [ ! -z ${FIND_RECOVERY_CONF} ] || [ ! -z ${FIND_RECOVERY_DONE} ] ;
> then
>     #${PSQL} -p ${PORT} -c "SELECT pg_start_backup('Streaming Replication',
>     #true)" postgres
> 
>     RECOVERY_CONF="standby_mode = 'on'\nprimary_conninfo =
>     'host=${RECOVERY_HOST} port=${PORT} user=postgis
>     password=gispost'\ntrigger_file =
>     '/tmp/trigger_file${ACTUAL_NODE_ID}'\n"
>     echo "${RECOVERY_CONF}" > ${CLUSTER}/remote.recovery.conf
> 
>     rsync -C -a -c --delete --exclude postgresql.conf --exclude
>     postmaster.pid \
>     --exclude postmaster.opts --exclude pg_log --exclude pg_hba.conf\
>     --exclude recovery.conf --exclude recovery.done --exclude
>     --remote.recovery.conf \
>     --exclude pg_xlog \
>     ${CLUSTER}/ ${RECOVERY_NODE_HOSTNAME}:${CLUSTER}/ && \
> 
>     # Everything performed on the remote machine
>     /usr/bin/ssh -T postgres@${RECOVERY_NODE_HOSTNAME} /bin/mkdir -p
>     ${CLUSTER}/pg_xlog && \
>     /bin/chmod 700 ${CLUSTER}/pg_xlog \
>     /bin/rm -f ${CLUSTER}/recovery.done
> 
>     scp ${CLUSTER}/remote.recovery.conf
>     postgres@${RECOVERY_NODE_HOSTNAME}:${CLUSTER}/recovery.conf
> 
>     # Delete recovery.done and tmp.recovery.conf locally
>     /bin/rm -f ${CLUSTER}/recovery.done ${CLUSTER}/remote.recovery.conf
> 
>     #${PSQL} -p ${PORT} -c "SELECT pg_stop_backup()" postgres
> 
>     exit 0
> # If recovery.done or recovery.conf file is not found, it means that the
> # standby has crashed
> else
>     exit 0
> fi
> 
> And this is my pgpool_remote_start:
> 
> if [ ${#} -ne 2 ] ; then
>     echo "pgpool_remote_start remote_host remote_datadir"
>     exit 1
> fi
> 
> DEST=${1}
> DESTDIR=${2}
> PGCTL="/usr/lib/postgresql/9.0.1/bin/pg_ctl"
> LOGFILE="/home/database/9.0.1/postgresql-9.0.1.log"
> 
> echo "pgpool_remote_start ${DEST} ${DESTDIR}"
> ssh -T postgres@${DEST} ${PGCTL} -D ${DESTDIR} -l ${LOGFILE} start
> 
> Where am I wrong?
> 
> Thanks in advance,
> 
> Alessandro
> 
>>> I've read here:
>>> http://pgpool.projects.postgresql.org/contrib_docs/simple_sr_setting/index.html
>>> that online recovery with streaming replication is possible.
>>> But in the official doc it is said that only with standby it's
>>> possible to
>>> perform online recovery
>>> (http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html#master_slave_mode)...who
>>> is right?
>> Both are correct. In the former, here is the supposed scenario:
>>
>> 1) primary goes down
>> 2) standby takes over and becomes primary
>> 3) By using online recovery, the downed former primary could be online
>>     as a new standby.
>>
>>> I've tried to configure pgpool to perform an online recovery anyway,
>>> as
>>> described in the first link, but without success. Unlike that tutorial
>>> I have
>>> three separated machines which are primary, standby and pgpool.
>>>
>>> Everything works fine but at the very last step, postgres on the
>>> ex-primary (now
>>> standby) says:
>>> LOG:  entering standby mode
>>> LOG:  streaming replication successfully connected to primary
>>> LOG:  unexpected timeline ID 24 in log file 3, segment 60, offset 0
>>> FATAL:  terminating walreceiver process due to administrator command
>>> LOG:  unexpected timeline ID 24 in log file 3, segment 60, offset 0
>> As far as I can tell, it seems the base backup from primary to standby
>> was not done properly. What is your baseback.sh script exactly?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
> 
> 
> -- 
> Alessandro Candini
> 
> MEEO S.r.l.
> Via Saragat 9
> I-44122 Ferrara, Italy
> Tel: +39 0532 1861501
> Fax: +39 0532 1861637
> http://www.meeo.it
> 
> ========================================
> "ATTENZIONE:le informazioni contenute in questo messaggio sono
> da considerarsi confidenziali ed il loro utilizzo è riservato
> unicamente
> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
> per errore è tenuto ad informare il mittente ed a rimuoverlo
> definitivamente da ogni supporto elettronico o cartaceo."
> 
> "WARNING:This message contains confidential and/or proprietary
> information which may be subject to privilege or immunity and which
> is intended for use of its addressee only. Should you receive this
> message in error, you are kindly requested to inform the sender and
> to definitively remove it from any paper or electronic format."
>