[pgpool-general: 8648] Re: online recovery causes node to loose replication state.

Todd Stein todd.stein at microfocus.com
Fri Mar 17 04:24:50 JST 2023


Hi Tatsuo,
Please find the attached zip file containing the config files and logs you asked for.

Thank you in advance !


Regards,

Todd Stein

-----Original Message-----
From: pgpool-general <pgpool-general-bounces at pgpool.net> On Behalf Of Todd Stein
Sent: Thursday, March 16, 2023 7:36 AM
To: Tatsuo Ishii <ishii at sraoss.co.jp>
Cc: pgpool-general at pgpool.net
Subject: [pgpool-general: 8646] Re: online recovery causes node to loose replication state.

Yes, I'll send them.  It may take a little while to clean up the hostnames and addresses.

-----Original Message-----
From: pgpool-general <pgpool-general-bounces at pgpool.net> On Behalf Of Tatsuo Ishii
Sent: Wednesday, March 15, 2023 11:02 PM
To: Todd Stein <todd.stein at microfocus.com>
Cc: pgpool-general at pgpool.net
Subject: [pgpool-general: 8644] Re: online recovery causes node to loose replication state.

To investigate the issue we need pgpool.conf, postgresql.conf and PostgreSQL logs. Can you share? (of course you can hide password etc. sensitive information).

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.jp%2Findex_en%2F&data=05%7C01%7Ctodd.stein%40microfocus.com%7C22d8bec3bbd545b4738e08db2612a8f8%7C856b813c16e549a585ec6f081e13b527%7C0%7C0%7C638145633817821448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cbUu9u0AK8nar9Vhh%2BBM06%2BgMQTydoRkDcCaHD8z0Ko%3D&reserved=0
Japanese:https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.jp%2F&data=05%7C01%7Ctodd.stein%40microfocus.com%7C22d8bec3bbd545b4738e08db2612a8f8%7C856b813c16e549a585ec6f081e13b527%7C0%7C0%7C638145633817821448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6IThRxgwI8HcU1VPVzRxTBwEwyFEkliH7%2BE2XbOwYRQ%3D&reserved=0


> Hi All,
> I'm running a 3 node cluster based on PostgreSQL 14 and pgpool 4.4.1.  I'm having an issue while testing failover scenarios.
> 
> At start of scenario:
> 
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:56:31
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:56:38
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 13:46:29
> (3 rows)
> 
> After stopping postgres on node0
> 
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | down   | down      | 0.333333  | standby | unknown | 0          | false             | 0                 |                   |                        | 2023-03-15 12:34:05
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:34:05
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:34:09
> (3 rows)
> 
> After running recovery on node 0
> pcp_recovery_node -h 10.78.121.3 -p 9898 -U postgres -n 0
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:55:59
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:34:05
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:34:09
> (3 rows)
> 
> After promoting node1 so that node0 becomes primary again:
> pcp_promote_node -s -n 0 -U pgpool
> Note that Ioose the replication_state and replication_sync_state on node2, would like to know why this happens.
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:56:31
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:56:38
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 |                   |                        | 2023-03-15 12:56:38
> (3 rows)
> 
> My process to get the cluster back to normal.  Don't understand why the recovery process is failing...  Eventually, it works...
> pcp_recovery_node -h 10.78.121.3 -p 9898 -U postgres -n 2
> 
> ERROR:  recovery is checking if postmaster is started
> DETAIL:  postmaster on hostname:"catvmtspg04c.domain.net" 
> database:"template1" user:"postgres" failed to start in 0 second
> 
> [postgres at catvmtspg04a pgpool-II]$ pcp_recovery_node -h 10.78.121.3 -p 
> 9898 -U postgres -n 2
> Password:
> ERROR:  recovery is checking if postmaster is started
> DETAIL:  postmaster on hostname:"catvmtspg04c.domain.net" 
> database:"template1" user:"postgres" failed to start in 0 second
> 
> [postgres at catvmtspg04a pgpool-II]$ pcp_recovery_node -h 10.78.121.3 -p 
> 9898 -U postgres -n 2
> Password:
> pcp_recovery_node -- Command Successful cluster back to normal.
> [postgres at catvmtspg04a pgpool-II]$ psql -h 10.78.121.3 -p 9999 -U pgpool postgres -c "show pool_nodes"
> Password for user pgpool:
> node_id |           hostname           | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0       | catvmtspg04a.domain.net | 5432 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2023-03-15 12:56:31
> 1       | catvmtspg04b.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 12:56:38
> 2       | catvmtspg04c.domain.net | 5432 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2023-03-15 13:46:29
> (3 rows)
> 
> 
> Regards,
> 
> Todd Stein
> OpsBridge Technical Success
> OpenText
> (Cell) +1 (941) 248-8752
> tstein2 at opentext.com<mailto:tstein2 at opentext.com>
> 
_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net
https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.pgpool.net%2Fmailman%2Flistinfo%2Fpgpool-general&data=05%7C01%7Ctodd.stein%40microfocus.com%7C22d8bec3bbd545b4738e08db2612a8f8%7C856b813c16e549a585ec6f081e13b527%7C0%7C0%7C638145633817821448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Y6EHCp2o49jn0ZjcJTUTjXHVSoTKSD0wjbvJLeU9Qmk%3D&reserved=0
_______________________________________________
pgpool-general mailing list
pgpool-general at pgpool.net
https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.pgpool.net%2Fmailman%2Flistinfo%2Fpgpool-general&data=05%7C01%7Ctodd.stein%40microfocus.com%7C22d8bec3bbd545b4738e08db2612a8f8%7C856b813c16e549a585ec6f081e13b527%7C0%7C0%7C638145633817821448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Y6EHCp2o49jn0ZjcJTUTjXHVSoTKSD0wjbvJLeU9Qmk%3D&reserved=0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool_files.zip
Type: application/x-zip-compressed
Size: 5391578 bytes
Desc: pgpool_files.zip
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20230316/eade61fa/attachment.bin>


More information about the pgpool-general mailing list