[pgpool-general: 8107] Re: Problems taking node offline
Bo Peng
pengbo at sraoss.co.jp
Wed Apr 27 11:37:44 JST 2022
Hello,
On Tue, 26 Apr 2022 15:01:15 +0000
Jon SCHEWE <jon.schewe at raytheon.com> wrote:
> >> I want to take a backend node offline and having some trouble with it.
> >>
> >> I check the status of my notes:
> >> template1=> show pool_nodes;
> >> node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> >> ---------+----------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >> 0 | psql-01.mgmt.bbn.com | 5432 | up | 0.333333 | standby | 646198 | false | 0 | streaming | sync | 2022-04-25 14:19:57
> >> 1 | psql-02.mgmt.bbn.com | 5432 | up | 0.333333 | primary | 2115353 | true | 0 | | | 2022-04-25 14:16:24
> >> 2 | psql-03.mgmt.bbn.com | 5432 | up | 0.333333 | standby | 2913 | false | 0 | streaming | potential | 2022-04-25 14:24:25
> >> (3 rows)
> >>
> >> I want to take psql-02 offline.
> >>
> >> pcp_detach_node -h psql.mgmt.bbn.com -p 9897 -U pgpool -g -n 1
> >> Password:
> >> pcp_detach_node -- Command Successful
> >>
> >>
> >> I check the status again:
> >> template1=> show pool_nodes;
> >> node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> >> ---------+----------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >> 0 | psql-01.mgmt.bbn.com | 5432 | up | 0.333333 | standby | 718555 | true | 0 | streaming | sync | 2022-04-25 14:19:57
> >> 1 | psql-02.mgmt.bbn.com | 5432 | up | 0.333333 | primary | 2373454 | false | 0 | | | 2022-04-25 14:16:24
> >> 2 | psql-03.mgmt.bbn.com | 5432 | up | 0.333333 | standby | 3310 | false | 0 | streaming | potential | 2022-04-25 14:24:25
> >> (3 rows)
> >>
> >>
> >> I still see psql-02 online. Why is that?
> >
> >Could you share pgpool.conf
>
> Yes, attached.
>
> > and full log after running pcp_detach_node?
>
> The only log messages are what I sent originally.
>
> >Which version of Pgpool-II are you using?
>
> 4.1.4
Thank you.
I think watchdog may not be working properly.
If you run pcp_detach_node, failover_command and follow_master_command should be executed.
But I could not see the related logs.
Could you check the watchdog status using "pcp_watchdog_info" command?
Does this issue occur if you disable watchdog "use_watchdog = off"?
> This morning I checked and 2 of the nodes are marked as down and the primary has changed. Perhaps the pcp command took some more time (hours) to complete?
>
> template1=> show pool_nodes;
> node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> ---------+----------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> 0 | psql-01.mgmt.bbn.com | 5432 | up | 0.333333 | primary | 27047705 | true | 0 | | | 2022-04-26 00:19:21
> 1 | psql-02.mgmt.bbn.com | 5432 | down | 0.333333 | standby | 15170253 | false | 0 | | | 2022-04-26 00:19:21
> 2 | psql-03.mgmt.bbn.com | 5432 | down | 0.333333 | standby | 214556 | false | 0 | streaming | sync | 2022-04-26 00:19:21
> (3 rows)
>
>
> >
> >> Log messages during the pcp command:
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 11674: LOG: new IPC connection received
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 11674: LOG: online recovery request from local pgpool-II node received on IPC interface is forwarded to master watchdog node "psql-02.mgmt.bbn.com:9898 Linux psql-02"
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 11674: DETAIL: waiting for the reply...
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 13736: LOG: PCP process with pid: 20049 exit with SUCCESS.
> >> Apr 25 15:59:48 psql-02 pgpool[11672]: 2022-04-25 15:59:48: pid 13736: LOG: PCP process with pid: 20049 exits with status 0
> >>
> >>
> >> Apr 25 15:59:54 psql-02 pgpool[11672]: 2022-04-25 15:59:54: pid 11672: LOG: child process with pid: 19261 exits with status 256
> >> Apr 25 15:59:54 psql-02 pgpool[11672]: 2022-04-25 15:59:54: pid 11672: LOG: fork a new child process with pid: 20176
> >> Apr 25 15:59:54 psql-02 pgpool[11672]: 2022-04-25 15:59:54: pid 11672: LOG: child process with pid: 19006 exits with status 256
> >> Apr 25 15:59:54 psql-02 pgpool[11672]: 2022-04-25 15:59:54: pid 11672: LOG: fork a new child process with pid: 20178
--
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan
http://www.sraoss.co.jp/
More information about the pgpool-general
mailing list