[pgpool-general: 8251] Re: pcp_recovery_node command fails
Tatsuo Ishii
ishii at sraoss.co.jp
Sat Jun 25 22:09:36 JST 2022
Do you have the extension on template1 database as well? Pgpool-II
connects to template1 database while calling pgpool_recovery.
> I've gone back and run the \dx query on each of the nodes. Same result
>
>
> -----Original Message-----
> From: pgpool-general <pgpool-general-bounces at pgpool.net> On Behalf Of Tatsuo Ishii
> Sent: Friday, June 24, 2022 7:02 PM
> To: Todd Stein <todd.stein at microfocus.com>
> Cc: pgpool-general at pgpool.net
> Subject: [pgpool-general: 8249] Re: pcp_recovery_node command fails
>
> Where did you run the \dx command? You need to run \dx on the PostgreSQL primary node (probably catvmdxcpg12b.ftc.hpeswlab.net?).
>
>> Looks like I have the same extension.
>> postgres=# \dx pgpool_recovery
>> List of installed extensions
>> Name | Version | Schema | Description
>> -----------------+---------+--------+---------------------------------
>> -----------------+---------+--------+----------
>> pgpool_recovery | 1.4 | public | recovery functions for pgpool-II for V4.3
>> (1 row)
>>
>> postgres=#
>
> Where did you run the \dx command? You need to run \dx on the PostgreSQL primary node.
>
> Also I suggest followings:
>
> - Share pgpool.conf so that people (including me) confirm that your
> configuration and attempts are correct.
>
> - Disable watchdog for now. At this point there are too many
> possibilities for the problem (exetnsion is not installed,
> pgpool.conf is not correct and/or unknown problem with
> watchdog). Once you confirm the system works, you can enable
> watchdog and continue testing. Let's proceed step by step.
>
>> I want to run pcp_recovery_node command /usr/bin/pcp_recovery_node -d
>> -U postgres -h 16.78.121.246 -p 9898 -n 0
>>
>> AFAIK the first step (stage) in the pcp_recovery_node process is to run the following:
>> recovery_1st_stage_command = '/var/lib/pgsql/12/data/recovery_1st_stage'
>> then the pgpool_remote_start script is run.
>>
>> When the pcp_recovery_node command is run, it recieves the following list of arguments:
>> PRIMARY_NODE_PGDATA=/var/lib/pgsql/12/data %R
>> DEST_NODE_HOST=catvmdxcpg12a.ftc.hpeswlab.net %h
>> DEST_NODE_PGDATA=/var/lib/pgsql/12/data %D
>> PRIMARY_NODE_PORT=5432 %r
>> DEST_NODE_ID=0 %d
>> DEST_NODE_PORT=5432 %p
>> PRIMARY_NODE_HOST=catvmdxcpg12b.ftc.hpeswlab.net %H
>>
>> When the pgpool_remote_start script is run, it recieves the following list of arguments:
>> DEST_NODE_HOST=catvmdxcpg12a.ftc.hpeswlab.net %h
>> DEST_NODE_PGDATA=/var/lib/pgsql/12/data %D
>>
>> When I run /usr/bin/pcp_recovery_node, the following error is sent to stdout.
>> -bash-4.2$ /usr/bin/pcp_recovery_node -U postgres -h 16.78.121.246 -p 9898 -n 0
>> Password:
>> ERROR: executing recovery, execution of command failed at "1st stage"
>> DETAIL: command:"recovery_1st_stage"
>>
>> However, if I run the two scripts manually with the arguments, the process works.
>>
>> -bash-4.2$ $PGDATA/recovery_1st_stage /var/lib/pgsql/12/data
>> catvmdxcpg12a.ftc.hpeswlab.net /var/lib/pgsql/12/data 5432 0 9898
>> catvmdxcpg12b.ftc.hpeswlab.net
>> + MAIN_NODE_PGDATA=/var/lib/pgsql/12/data
>> + DEST_NODE_HOST=catvmdxcpg12a.ftc.hpeswlab.net
>> + DEST_NODE_PGDATA=/var/lib/pgsql/12/data
>> + MAIN_NODE_PORT=5432
>> + DEST_NODE_ID=0
>> + DEST_NODE_PORT=9898
>> + MAIN_NODE_HOST=catvmdxcpg12b.ftc.hpeswlab.net
>> + PGHOME=/usr/pgsql-12
>> + ARCHIVEDIR=/var/lib/pgsql/archivedir
>> + REPLUSER=repl
>> + MAX_DURATION=60
>> + echo recovery_1st_stage: start: pg_basebackup for Standby node 0
>> recovery_1st_stage: start: pg_basebackup for Standby node 0 ...
>> ...
>> recovery_1st_stage: end: recovery_1st_stage is completed successfully
>> + exit 0
>>
>> Next, manually run pgpool_remote_start:
>> -bash-4.2$ $PGDATA/pgpool_remote_start catvmdxcpg12a.ftc.hpeswlab.net
>> /var/lib/pgsql/12/data
>> + DEST_NODE_HOST=catvmdxcpg12a.ftc.hpeswlab.net
>> + DEST_NODE_PGDATA=/var/lib/pgsql/12/data
>> + PGHOME=/usr/pgsql-12
>> + echo pgpool_remote_start: start: remote start Standby node
>> + catvmdxcpg12a.ftc.hpeswlab.net
>> pgpool_remote_start: start: remote start Standby node
>> catvmdxcpg12a.ftc.hpeswlab.net
>> + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
>> + postgres at catvmdxcpg12a.ftc.hpeswlab.net -i
>> + /var/lib/pgsql/.ssh/id_rsa_pgpool ls /tmp
>> Warning: Permanently added 'catvmdxcpg12a.ftc.hpeswlab.net,16.78.126.184' (ECDSA) to the list of known hosts.
>> + '[' 0 -ne 0 ']'
>> + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres at catvmdxcpg12a.ftc.hpeswlab.net -i /var/lib/pgsql/.ssh/id_rsa_pgpool '
>> /usr/pgsql-12/bin/pg_ctl -l /dev/null -w -D /var/lib/pgsql/12/data
>> start '
>> Warning: Permanently added 'catvmdxcpg12a.ftc.hpeswlab.net,16.78.126.184' (ECDSA) to the list of known hosts.
>> waiting for server to start.... done
>> server started
>> + '[' 0 -ne 0 ']'
>> + echo pgpool_remote_start: end: PostgreSQL on catvmdxcpg12a.ftc.hpeswlab.net is started successfully.
>> pgpool_remote_start: end: PostgreSQL on catvmdxcpg12a.ftc.hpeswlab.net is started successfully.
>> + exit 0
>> -bash-4.2$
>>
>> The postgres server did not start! (only according to systemctl status
>> postgresql-12 running pg_ctl status shows that it is running.
>>
>> Looking at the replication_delay for node 0 shows a value of 67109080.
>> All of the files in $PGDATA have a very recent time stamp indicating that pg_basebackup had run.
>
> You can check the standby PostgreSQL log to see what's going on.
>
>> Regards,
>>
>> Todd Stein
>>
>> -----Original Message-----
>> From: Tatsuo Ishii <ishii at sraoss.co.jp>
>> Sent: Thursday, June 23, 2022 7:46 PM
>> To: Todd Stein <todd.stein at microfocus.com>
>> Cc: jon.schewe at raytheon.com; pgpool-general at pgpool.net
>> Subject: Re: [pgpool-general: 8244] Re: pcp_recovery_node command
>> fails
>>
>>> Many responses recommended installing the pgpool_recovery extension, I had done it as part of the install. My install was done with RPMs.
>>>
>>> ERROR: extension "pgpool_recovery" already exists
>>> 2022-06-23 16:29:25.782 EDT [21981] STATEMENT: CREATE EXTENSION
>>> pgpool_recovery; The recovery_1st_stage script came from a sample provided with the RPM version. The only thing I should need to do with it is to adjust the path of $PGHOME.
>>
>> It's apparent that the correct version of pgpool_recovery extension
>> was not installed or pgpool_recovery extension was not installed at
>> all. You can check it by following command using psql on the primary
>> PostgreSQL:
>>
>> test=# \dx pgpool_recovery
>> List of installed extensions
>> Name | Version | Schema | Description
>> -----------------+---------+--------+---------------------------------
>> -----------------+---------+--------+--
>> -----------------+---------+--------+--------
>> pgpool_recovery | 1.4 | public | recovery functions for pgpool-II for V4.3
>> (1 row)
>>
>>> Regards,
>>>
>>> Todd Stein
>>>
>>> -----Original Message-----
>>> From: Todd Stein
>>> Sent: Thursday, June 23, 2022 4:08 PM
>>> To: Jon SCHEWE <jon.schewe at raytheon.com>; pgpool-general at pgpool.net
>>> Subject: RE: pcp_recovery_node command fails
>>>
>>> This is the stdout:
>>> ERROR: executing recovery, execution of command failed at "1st stage"
>>> DETAIL: command:"recovery_1st_stage"
>>>
>>> The pgpool logs don't have much useful info. Even when I set them to debug, it's not very helpful.
>>>
>>> This seems to be a pretty common issue, lots of people post about the issue, but I've not seen a resolution to it yet.
>>>
>>> the postgres log is actually more useful:
>>> ERROR: function pgpool_recovery(unknown, unknown, unknown, unknown,
>>> integer, unknown, unknown) does not exist at character 8
>>> 2022-06-23 16:03:53.740 EDT [25708] HINT: No function matches the given name and argument types. You might need to add explicit type casts.
>>> 2022-06-23 16:03:53.740 EDT [25708] STATEMENT: SELECT
>>> pgpool_recovery('recovery_1st_stage', 'nodea',
>>> '/var/lib/pgsql/12/data', '5432', 0, '5432', 'nodeb')
>>>
>>>
>>> Regards,
>>>
>>> Todd Stein
>>>
>>> -----Original Message-----
>>> From: pgpool-general <pgpool-general-bounces at pgpool.net> On Behalf Of
>>> Jon SCHEWE
>>> Sent: Thursday, June 23, 2022 3:34 PM
>>> To: Todd Stein <todd.stein at microfocus.com>; pgpool-general at pgpool.net
>>> Subject: [pgpool-general: 8242] Re: pcp_recovery_node command fails
>>>
>>>> I'm trying to use pcp_recovery_node for online recovery in a pgpool/postgresql-12 cluster.
>>>>
>>>> My cluster has PostgreSQL 12.8 and pgpool 4.3.2 running on CentOS 7.9 linux.
>>>>
>>>>
>>>>
>>>> I've tried so many things, I'll not go into those details just yet.
>>>>
>>>>
>>>>
>>>> To start with, here is the output of the pcp_recovery_node command:
>>>>
>>>>
>>>>
>>>> pcp_recovery_node -U postgres -h <VIP> -p 9898 -n 0
>>>>
>>>> Password:
>>>>
>>>> ERROR: executing recovery, execution of command failed at "1st stage"
>>>>
>>>> DETAIL: command:"recovery_1st_stage"
>>>>
>>>
>>> Do you see anything in your logs about the errors? Usually this is either on stdout from the service or in /var/log/pgpool...
>>> I'm guessing that your recovery_1st_stage script either isn't defined or isn't doing what you expect.
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>> _______________________________________________
>>> pgpool-general mailing list
>>> pgpool-general at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
More information about the pgpool-general
mailing list