[pgpool-general: 8644] Re: online recovery causes node to loose replication state.
Tatsuo Ishii
ishii at sraoss.co.jp
Thu Mar 16 12:01:32 JST 2023
To investigate the issue we need pgpool.conf, postgresql.conf and
PostgreSQL logs. Can you share? (of course you can hide password
etc. sensitive information).
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
> Hi All,
> I'm running a 3 node cluster based on PostgreSQL 14 and pgpool 4.4.1. I'm having an issue while testing failover scenarios.
>
> At start of scenario:
>
> node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0 | catvmtspg04a.domain.net | 5432 | up | up | 0.333333 | primary | primary | 0 | true | 0 | | | 2023-03-15 12:56:31
> 1 | catvmtspg04b.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-03-15 12:56:38
> 2 | catvmtspg04c.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-03-15 13:46:29
> (3 rows)
>
> After stopping postgres on node0
>
> node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0 | catvmtspg04a.domain.net | 5432 | down | down | 0.333333 | standby | unknown | 0 | false | 0 | | | 2023-03-15 12:34:05
> 1 | catvmtspg04b.domain.net | 5432 | up | up | 0.333333 | primary | primary | 0 | true | 0 | | | 2023-03-15 12:34:05
> 2 | catvmtspg04c.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-03-15 12:34:09
> (3 rows)
>
> After running recovery on node 0
> pcp_recovery_node -h 10.78.121.3 -p 9898 -U postgres -n 0
> node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0 | catvmtspg04a.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-03-15 12:55:59
> 1 | catvmtspg04b.domain.net | 5432 | up | up | 0.333333 | primary | primary | 0 | true | 0 | | | 2023-03-15 12:34:05
> 2 | catvmtspg04c.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-03-15 12:34:09
> (3 rows)
>
> After promoting node1 so that node0 becomes primary again:
> pcp_promote_node -s -n 0 -U pgpool
> Note that Ioose the replication_state and replication_sync_state on node2, would like to know why this happens.
> node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0 | catvmtspg04a.domain.net | 5432 | up | up | 0.333333 | primary | primary | 0 | true | 0 | | | 2023-03-15 12:56:31
> 1 | catvmtspg04b.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-03-15 12:56:38
> 2 | catvmtspg04c.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | | | 2023-03-15 12:56:38
> (3 rows)
>
> My process to get the cluster back to normal. Don't understand why the recovery process is failing... Eventually, it works...
> pcp_recovery_node -h 10.78.121.3 -p 9898 -U postgres -n 2
>
> ERROR: recovery is checking if postmaster is started
> DETAIL: postmaster on hostname:"catvmtspg04c.domain.net" database:"template1" user:"postgres" failed to start in 0 second
>
> [postgres at catvmtspg04a pgpool-II]$ pcp_recovery_node -h 10.78.121.3 -p 9898 -U postgres -n 2
> Password:
> ERROR: recovery is checking if postmaster is started
> DETAIL: postmaster on hostname:"catvmtspg04c.domain.net" database:"template1" user:"postgres" failed to start in 0 second
>
> [postgres at catvmtspg04a pgpool-II]$ pcp_recovery_node -h 10.78.121.3 -p 9898 -U postgres -n 2
> Password:
> pcp_recovery_node -- Command Successful
> cluster back to normal.
> [postgres at catvmtspg04a pgpool-II]$ psql -h 10.78.121.3 -p 9999 -U pgpool postgres -c "show pool_nodes"
> Password for user pgpool:
> node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+------------------------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0 | catvmtspg04a.domain.net | 5432 | up | up | 0.333333 | primary | primary | 0 | true | 0 | | | 2023-03-15 12:56:31
> 1 | catvmtspg04b.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-03-15 12:56:38
> 2 | catvmtspg04c.domain.net | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-03-15 13:46:29
> (3 rows)
>
>
> Regards,
>
> Todd Stein
> OpsBridge Technical Success
> OpenText
> (Cell) +1 (941) 248-8752
> tstein2 at opentext.com<mailto:tstein2 at opentext.com>
>
More information about the pgpool-general
mailing list