[pgpool-hackers: 3375] Re: Behavior of resigned master watchdog node

Thu Aug 8 17:28:24 JST 2019

Oops. I should have connected to the master Pgpool-II:

t-ishii$ psql -p 50004 -c "show pool_nodes" test
 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 51000 | up     | 0.333333  | primary | 0          | true              | 0                 |                   |                        | 2019-08-08 17:26:27
 1       | /tmp     | 51001 | up     | 0.333333  | standby | 0          | false             | 0                 | streaming         | async                  | 2019-08-08 17:26:27
 2       | /tmp     | 51002 | up     | 0.333333  | standby | 0          | false             | 0                 | streaming         | async                  | 2019-08-08 17:26:27
(3 rows)

Now node 0 is primary, works great as I expected.
Sorry for confusion.

> Hi Usama,
> 
> Thanks for the commit. Unfortunately the fix did not work for me.
> 
> psql -p 50000 -c "show pool_nodes" test
>  node_id | hostname | port  |   status   | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
> ---------+----------+-------+------------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>  0       | /tmp     | 51000 | quarantine | 0.333333  | standby | 0          | false             | 0                 |                   |                        | 2019-08-08 17:09:17
>  1       | /tmp     | 51001 | up         | 0.333333  | standby | 0          | false             | 0                 |                   |                        | 2019-08-08 17:09:00
>  2       | /tmp     | 51002 | up         | 0.333333  | standby | 0          | true              | 0                 |                   |                        | 2019-08-08 17:09:00
> (3 rows)
> 
> There's no primary PostgreSQL node after node was quarantined.
> Pgpool log attached.
> 
> From: Muhammad Usama <m.usama at gmail.com>
> Subject: Re: Behavior of resigned master watchdog node
> Date: Thu, 8 Aug 2019 11:21:24 +0500
> Message-ID: <CAEJvTzUpYZHc=T8FvOB12N=gC_Qj961dZEpRZTggsLHz_gJqSQ at mail.gmail.com>
> 
>> Hi Ishii-San
>> 
>> Thanks for providing the log files along with the steps to reproduce the
>> issue. I have pushed the fix for it
>> 
>> https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=2d7702c6961e8949e97992c23a42c8616af36d84
>> 
>> Thanks
>> Best Regards
>> Muhammad Usama
>> 
>> 
>> On Fri, Aug 2, 2019 at 4:50 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> 
>>> > Hi Ishii-San
>>> >
>>> > On Fri, Aug 2, 2019 at 10:25 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>> >
>>> >> Hi Usama,
>>> >>
>>> >> In your commit:
>>> >>
>>> >>
>>> https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=33df0d33df1ce701f07fecaeef5b87a2707c08f2
>>> >>
>>> >> "the master watchdog node should resign from master responsibilities
>>> >> if the primary backend node gets into quarantine state on that."
>>> >>
>>> >> While testing this feature, I noticed that the master watchdog node
>>> >> resign from master responsibilities as expected in this case. However
>>> >> none of standby nodes gets promoted to new primary. As a result,
>>> >> there's no primary node any more after the master watchdog resign.
>>> >>
>>> >> This is not very good because users cannot
>>> >
>>> >
>>> > This is not at all expected behavior, Can you please share the logs or
>>> > steps to reproduce the issue.
>>> > I will look into this on priority.
>>>
>>> Sure. Pgpool.logs attached.
>>>
>>> Here are steps to reproduce the issue.
>>>
>>> - Pgpool-II master branch head with patch [pgpool-hackers: 3361].
>>>
>>> 1) create 3 watchdog/3 PostgreSQL cluster: watchdog_setup -wn 3 -n 3
>>>
>>> $ psql -p 50000 -c "show pool_nodes" test
>>>
>>>  node_id | hostname | port  | status | lb_weight |  role   | select_cnt |
>>> load_balance_node | replication_delay | replication_state |
>>> replication_sync_state | last_status_change
>>>
>>> ---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>>>  0       | /tmp     | 51000 | up     | 0.333333  | primary | 0          |
>>> true              | 0                 |                   |
>>>         | 2019-08-02 20:38:25
>>>  1       | /tmp     | 51001 | up     | 0.333333  | standby | 0          |
>>> false             | 0                 | streaming         | async
>>>         | 2019-08-02 20:38:25
>>>  2       | /tmp     | 51002 | up     | 0.333333  | standby | 0          |
>>> false             | 0                 | streaming         | async
>>>         | 2019-08-02 20:38:25
>>> (3 rows)
>>>
>>> $ pcp_watchdog_info -p 50001 -v -w
>>>
>>> Watchdog Cluster Information
>>> Total Nodes          : 3
>>> Remote Nodes         : 2
>>> Quorum state         : QUORUM EXIST
>>> Alive Remote Nodes   : 2
>>> VIP up on local node : YES
>>> Master Node Name     : localhost:50000 Linux tishii-CFSV7-1
>>> Master Host Name     : localhost
>>>
>>> Watchdog Node Information
>>> Node Name      : localhost:50000 Linux tishii-CFSV7-1
>>> Host Name      : localhost
>>> Delegate IP    : Not_Set
>>> Pgpool port    : 50000
>>> Watchdog port  : 50002
>>> Node priority  : 3
>>> Status         : 4
>>> Status Name    : MASTER
>>>
>>> Node Name      : localhost:50004 Linux tishii-CFSV7-1
>>> Host Name      : localhost
>>> Delegate IP    : Not_Set
>>> Pgpool port    : 50004
>>> Watchdog port  : 50006
>>> Node priority  : 2
>>> Status         : 7
>>> Status Name    : STANDBY
>>>
>>> Node Name      : localhost:50008 Linux tishii-CFSV7-1
>>> Host Name      : localhost
>>> Delegate IP    : Not_Set
>>> Pgpool port    : 50008
>>> Watchdog port  : 50010
>>> Node priority  : 1
>>> Status         : 7
>>> Status Name    : STANDBY
>>>
>>> 2) echo "0      down" > pgpoo0/log
>>>
>>> 3) check the result.
>>>
>>> $psql -p 50000 -c "show pool_nodes" test
>>>
>>>  node_id | hostname | port  | status | lb_weight |  role   | select_cnt |
>>> load_balance_node | replication_delay | replication_state |
>>> replication_sync_state | last_status_change
>>>
>>> ---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>>>  0       | /tmp     | 51000 | up     | 0.333333  | standby | 0          |
>>> true              | 0                 |                   |
>>>         | 2019-08-02 20:39:25
>>>  1       | /tmp     | 51001 | up     | 0.333333  | standby | 0          |
>>> false             | 0                 |                   |
>>>         | 2019-08-02 20:39:25
>>>  2       | /tmp     | 51002 | up     | 0.333333  | standby | 0          |
>>> false             | 0                 |                   |
>>>         | 2019-08-02 20:39:25
>>> (3 rows)
>>>
>>> $ pcp_watchdog_info -p 50001 -v -w
>>> Watchdog Cluster Information
>>> Total Nodes          : 3
>>> Remote Nodes         : 2
>>> Quorum state         : QUORUM EXIST
>>> Alive Remote Nodes   : 2
>>> VIP up on local node : NO
>>> Master Node Name     : localhost:50004 Linux tishii-CFSV7-1
>>> Master Host Name     : localhost
>>>
>>> Watchdog Node Information
>>> Node Name      : localhost:50000 Linux tishii-CFSV7-1
>>> Host Name      : localhost
>>> Delegate IP    : Not_Set
>>> Pgpool port    : 50000
>>> Watchdog port  : 50002
>>> Node priority  : 3
>>> Status         : 7
>>> Status Name    : STANDBY
>>>
>>> Node Name      : localhost:50004 Linux tishii-CFSV7-1
>>> Host Name      : localhost
>>> Delegate IP    : Not_Set
>>> Pgpool port    : 50004
>>> Watchdog port  : 50006
>>> Node priority  : 2
>>> Status         : 4
>>> Status Name    : MASTER
>>>
>>> Node Name      : localhost:50008 Linux tishii-CFSV7-1
>>> Host Name      : localhost
>>> Delegate IP    : Not_Set
>>> Pgpool port    : 50008
>>> Watchdog port  : 50010
>>> Node priority  : 1
>>> Status         : 7
>>> Status Name    : STANDBY
>>>