[pgpool-general: 8696] Re: Fwd: Online recovery - Command failed at 1st stage
Bo Peng
pengbo at sraoss.co.jp
Thu Mar 30 13:03:44 JST 2023
Hello,
If you are using the bundled scripts in Pgpool 4.1.4 packages,
PRIMARY_NODE_HOST=$(hostname)
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob_plain;f=src/sample/scripts/recovery_1st_stage.sample;hb=refs/heads/V4_1_STABLE
Could you run "hostname" command to get the hostname on the primary server?
On Thu, 30 Mar 2023 00:18:04 +0530
Prasanna Venkadesh <prasmailme at gmail.com> wrote:
> Postgres version: 12.2
> pgpool-II version: 4.1.4 (karasukihoshi)
> OS: RHEL 7
>
> I have two instances of Postgres and one instance of pgpool-II running. I
> am following the tutorial here to set up HA for postgres.
> https://www.pgpool.net/docs/41/en/html/example-cluster.html
>
> I have created the required users (pgpool, repl), enabled passwordless SSH,
> configured pgpool to use streaming replication and brought up the primary
> postgres server. I am able to connect to pgpool via port 9999 and show
> pool_nodes shows both postgres instances.
>
> The primary's status is shown as up and the standby is shown as down as
> expected. Following the next steps to setup a standby server using online
> recovery mechanism fails with "command failed at 1st stage".
>
> Please find the Postgres log from the primary instance below.
> One thing I observed is that the PRIMARY_NODE_HOST value is empty and due
> to which pg_basebackup and psql commands are failing because -h has no
> value.
>
> Why would the value for PRIMARY_NODE_HOST go missing while all other
> variables have values?
>
> + PRIMARY_NODE_PGDATA=/var/lib/pgsql/12/data
> + DEST_NODE_HOST=172.16.2.149 <callto:172.16.2.149>
> + DEST_NODE_PGDATA=/var/lib/pgsql/12/data
> + PRIMARY_NODE_PORT=5432
> + DEST_NODE_ID=1
> + DEST_NODE_PORT=15432
> + PRIMARY_NODE_HOST=
> + PGHOME=/var/lib/pgsql/12
> + PGBIN=/usr/pgsql-12
> + ARCHIVEDIR=/var/lib/pgsql/12/archivedir
> + REPLUSER=repl
> + REPL_SLOT_NAME=172_16_2_149
> + POSTGRESQL_STARTUP_USER=postgres
> + SSH_KEY_FILE=id_rsa_pgpool
> + SSH_OPTIONS='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
> -i ~/.ssh/id_rsa_pgpool'
> + echo recovery_1st_stage: start: pg_basebackup for Standby node 1
> recovery_1st_stage: start: pg_basebackup for Standby node 1
> + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i
> '~/.ssh/id_rsa_pgpool' postgres at 172.16.2.149 <callto:172.16.2.149> ls /tmp
> Warning: Permanently added '172.16.2.149 <callto:172.16.2.149>' (ECDSA) to
> the list of known hosts.
> + '[' 0 -ne 0 ']'
> ++ /usr/pgsql-12/bin/initdb -V
> ++ awk '{print $3}'
> ++ sed 's/\([0-9]*\)[a-zA-Z].*/\1/'
> ++ sed 's/\..*//'
> + PGVERSION=12
> + '[' 12 -ge 12 ']'
> + RECOVERYCONF=/var/lib/pgsql/12/data/myrecovery.conf
> + /usr/pgsql-12/bin/psql -h -p 5432 postgres -c 'SELECT
> pg_create_physical_replication_slot('\''172_16_2_149'\'');'
> + '[' 2 -ne 0 ']'
> + echo ERROR: recovery_1st_stage: create replication slot '"172_16_2_149"'
> failed. You may need to create replication slot manually.
> ERROR: recovery_1st_stage: create replication slot "172_16_2_149" failed.
> You may need to create replication slot manually.
> ++ echo /var/lib/pgsql/12/data/myrecovery.conf
> ++ sed -e 's/\//\\\//g'
> ++ echo /var/lib/pgsql/12/data/myrecovery.conf
> ++ sed -e 's/\//\\\//g'
> + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i
> '~/.ssh/id_rsa_pgpool' postgres at 172.16.2.149 <callto:172.16.2.149> '
>
> set -o errexit
>
> [ -d "/var/lib/pgsql/12/data" ] && rm -rf /var/lib/pgsql/12/data
> [ -d "/var/lib/pgsql/12/archivedir" ] && rm -rf
> /var/lib/pgsql/12/archivedir/*
>
> /usr/pgsql-12/bin/pg_basebackup -h -U repl -p 5432 -D
> /var/lib/pgsql/12/data -X stream
>
> cat > /var/lib/pgsql/12/data/myrecovery.conf << EOT
> primary_conninfo = '\''host= port=5432 user=repl application_name=
> 172.16.2.149 <callto:172.16.2.149>
> passfile='\'''\''/var/lib/pgsql/.pgpass'\'''\'''\''
> recovery_target_timeline = '\''latest'\''
> restore_command = '\''scp -o StrictHostKeyChecking=no -o
> UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa_pgpool
> :/var/lib/pgsql/12/archivedir/%f %p'\''
> primary_slot_name = '\''172_16_2_149'\''
> EOT
>
> if [ 12 -ge 12 ]; then
> sed -i -e "\$ainclude_if_exists =
> '\''\/var\/lib\/pgsql\/12\/data\/myrecovery.conf'\''" -e
> "/^include_if_exists =
> '\''\/var\/lib\/pgsql\/12\/data\/myrecovery.conf'\''/d"
> /var/lib/pgsql/12/data/postgresql.conf
> touch /var/lib/pgsql/12/data/standby.signal
> else
> echo "standby_mode = '\''on'\''" >>
> /var/lib/pgsql/12/data/myrecovery.conf
> fi
> sed -i -e "s/#*port = .*/port = 15432/" -e "s@#*archive_command =
> .*@archive_command = '\''cp \"%p\"
> \"/var/lib/pgsql/12/archivedir/%f\"'\''@"
> /var/lib/pgsql/12/data/postgresql.conf
> '
> Warning: Permanently added '172.16.2.149 <callto:172.16.2.149>' (ECDSA) to
> the list of known hosts.
> pg_basebackup: error: too many command-line arguments (first is "repl")
> Try "pg_basebackup --help" for more information.
> + '[' 1 -ne 0 ']'
> + /usr/pgsql-12/bin/psql -h -p 5432 postgres -c 'SELECT
> pg_drop_replication_slot('\''172_16_2_149'\'');'
> + '[' 2 -ne 0 ']'
> + echo ERROR: recovery_1st_stage: drop replication slot '"172_16_2_149"'
> failed. You may need to drop replication slot manually.
> ERROR: recovery_1st_stage: drop replication slot "172_16_2_149" failed. You
> may need to drop replication slot manually.
> + echo ERROR: recovery_1st_stage: end: pg_basebackup failed. online
> recovery failed
> ERROR: recovery_1st_stage: end: pg_basebackup failed. online recovery failed
> + exit 1
> 2023-03-29 14 <callto:2023-03-29%2014>:11:26.654 EDT [7708] ERROR:
> pgpool_recovery failed
> 2023-03-29 14 <callto:2023-03-29%2014>:11:26.654 EDT [7708] STATEMENT:
> SELECT pgpool_recovery('recovery_1st_stage', '172.16.2.149
> <callto:172.16.2.149>', '/var/lib/pgsql/12/data', '5432', 1, '15432')
>
>
>
> --
> Thanks & Regards,
> Prasanna Venkadesh.
--
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS LLC
https://www.sraoss.co.jp/
More information about the pgpool-general
mailing list