[Pgpool-general] online recovery of primary is possible or not?
Tatsuo Ishii
ishii at sraoss.co.jp
Tue Nov 23 09:01:47 UTC 2010
It seems pg_start_bacup and pg_stop_backup are commented out in your
script. Why?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
> I have changed my basebackup.sh file a lot to get it work on 3
> different machines...
> Here it is:
>
> #!/bin/sh -x
> # It prints on postgresql log file
> # Execution example: ./basebackup.sh /home/database/9.0.1/data
> # 192.168.0.127 /home/database/9.0.1/data
>
> # At the beginning: 192.168.0.128 --> id 0 --> primary; 192.168.0.127
> # --> id 1 --> standby
>
> # Postgres data files on both machines (source and destination)
> CLUSTER="/home/database/9.0.1/data"
> PSQL="/usr/lib/postgresql/9.0.1/bin/psql"
> PORT="5433"
>
> MASTER_DB_CLUSTER=${1}
> RECOVERY_NODE_HOSTNAME=${2}
> RECOVERY_DB_CLUSTER=${3}
> FIND_RECOVERY_CONF=`find -type f -name recovery.conf`
> FIND_RECOVERY_DONE=`find -type f -name recovery.done`
>
> # Only last part of ip address (192.168.0.127 --> 127)
> RECOVERY_NODE_HOST=`echo "${RECOVERY_NODE_HOSTNAME}" | cut -d '.' -f
> 4`
> ACTUAL_NODE=`/sbin/ifconfig eth0 | grep 'Bcast' | cut -d ':' -f 2 |
> awk '{ print $1 }' | cut -d '.' -f 4`
>
> if [ ${ACTUAL_NODE} -eq "128" ] ; then
> ACTUAL_NODE_ID="0"
> RECOVERY_HOST="192.168.0.128"
> else
> ACTUAL_NODE_ID="1"
> RECOVERY_HOST="192.168.0.127"
> fi
>
> # If recovery.conf or recovery.done file is found, it means that the
> # primary has been crashed
> if [ ! -z ${FIND_RECOVERY_CONF} ] || [ ! -z ${FIND_RECOVERY_DONE} ] ;
> then
> #${PSQL} -p ${PORT} -c "SELECT pg_start_backup('Streaming Replication',
> #true)" postgres
>
> RECOVERY_CONF="standby_mode = 'on'\nprimary_conninfo =
> 'host=${RECOVERY_HOST} port=${PORT} user=postgis
> password=gispost'\ntrigger_file =
> '/tmp/trigger_file${ACTUAL_NODE_ID}'\n"
> echo "${RECOVERY_CONF}" > ${CLUSTER}/remote.recovery.conf
>
> rsync -C -a -c --delete --exclude postgresql.conf --exclude
> postmaster.pid \
> --exclude postmaster.opts --exclude pg_log --exclude pg_hba.conf\
> --exclude recovery.conf --exclude recovery.done --exclude
> --remote.recovery.conf \
> --exclude pg_xlog \
> ${CLUSTER}/ ${RECOVERY_NODE_HOSTNAME}:${CLUSTER}/ && \
>
> # Everything performed on the remote machine
> /usr/bin/ssh -T postgres@${RECOVERY_NODE_HOSTNAME} /bin/mkdir -p
> ${CLUSTER}/pg_xlog && \
> /bin/chmod 700 ${CLUSTER}/pg_xlog \
> /bin/rm -f ${CLUSTER}/recovery.done
>
> scp ${CLUSTER}/remote.recovery.conf
> postgres@${RECOVERY_NODE_HOSTNAME}:${CLUSTER}/recovery.conf
>
> # Delete recovery.done and tmp.recovery.conf locally
> /bin/rm -f ${CLUSTER}/recovery.done ${CLUSTER}/remote.recovery.conf
>
> #${PSQL} -p ${PORT} -c "SELECT pg_stop_backup()" postgres
>
> exit 0
> # If recovery.done or recovery.conf file is not found, it means that the
> # standby has crashed
> else
> exit 0
> fi
>
> And this is my pgpool_remote_start:
>
> if [ ${#} -ne 2 ] ; then
> echo "pgpool_remote_start remote_host remote_datadir"
> exit 1
> fi
>
> DEST=${1}
> DESTDIR=${2}
> PGCTL="/usr/lib/postgresql/9.0.1/bin/pg_ctl"
> LOGFILE="/home/database/9.0.1/postgresql-9.0.1.log"
>
> echo "pgpool_remote_start ${DEST} ${DESTDIR}"
> ssh -T postgres@${DEST} ${PGCTL} -D ${DESTDIR} -l ${LOGFILE} start
>
> Where am I wrong?
>
> Thanks in advance,
>
> Alessandro
>
>>> I've read here:
>>> http://pgpool.projects.postgresql.org/contrib_docs/simple_sr_setting/index.html
>>> that online recovery with streaming replication is possible.
>>> But in the official doc it is said that only with standby it's
>>> possible to
>>> perform online recovery
>>> (http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html#master_slave_mode)...who
>>> is right?
>> Both are correct. In the former, here is the supposed scenario:
>>
>> 1) primary goes down
>> 2) standby takes over and becomes primary
>> 3) By using online recovery, the downed former primary could be online
>> as a new standby.
>>
>>> I've tried to configure pgpool to perform an online recovery anyway,
>>> as
>>> described in the first link, but without success. Unlike that tutorial
>>> I have
>>> three separated machines which are primary, standby and pgpool.
>>>
>>> Everything works fine but at the very last step, postgres on the
>>> ex-primary (now
>>> standby) says:
>>> LOG: entering standby mode
>>> LOG: streaming replication successfully connected to primary
>>> LOG: unexpected timeline ID 24 in log file 3, segment 60, offset 0
>>> FATAL: terminating walreceiver process due to administrator command
>>> LOG: unexpected timeline ID 24 in log file 3, segment 60, offset 0
>> As far as I can tell, it seems the base backup from primary to standby
>> was not done properly. What is your baseback.sh script exactly?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>
>
> --
> Alessandro Candini
>
> MEEO S.r.l.
> Via Saragat 9
> I-44122 Ferrara, Italy
> Tel: +39 0532 1861501
> Fax: +39 0532 1861637
> http://www.meeo.it
>
> ========================================
> "ATTENZIONE:le informazioni contenute in questo messaggio sono
> da considerarsi confidenziali ed il loro utilizzo è riservato
> unicamente
> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
> per errore è tenuto ad informare il mittente ed a rimuoverlo
> definitivamente da ogni supporto elettronico o cartaceo."
>
> "WARNING:This message contains confidential and/or proprietary
> information which may be subject to privilege or immunity and which
> is intended for use of its addressee only. Should you receive this
> message in error, you are kindly requested to inform the sender and
> to definitively remove it from any paper or electronic format."
>
More information about the Pgpool-general
mailing list