[pgpool-general: 7683] Re: Failover question
Bo Peng
pengbo at sraoss.co.jp
Fri Sep 3 11:24:22 JST 2021
> I found the issue /bin/ip did not have the sticky set on node3. After setting the sticky bit on /bin/ip the failover on node3 is working
>
> root at pgtest-03:~# ll /bin/ip
> -rwxr-xr-x 1 root root 611960 Feb 13 2020 /bin/ip*
> root at pgtest-03:~# chmod u=rwxs,g=rx,o=rx /bin/ip
> root at pgtest-03:~# ll /bin/ip
> -rwsr-xr-x 1 root root 611960 Feb 13 2020 /bin/ip*
Great!
You need to set sticky bit on /bin/ip command.
> On 9/1/21, 10:54 AM, "pgpool-general on behalf of Wolf Schwurack" <pgpool-general-bounces at pgpool.net on behalf of wolf at uen.org> wrote:
>
> Hello
>
> This shows pcp_watchdog_info after node1 is added back in
> postgres at pgtest-02:~$ pcp_watchdog_info -h localhost -U wolf
> Password:
> 3 YES pgtest-02:9999 Linux pgtest-02 pgtest-02
>
> pgtest-02:9999 Linux pgtest-02 pgtest-02 9999 9000 4 LEADER
> pgtest-01:9999 Linux pgtest-01 pgtest-01 9999 9000 7 STANDBY
> pgtest-03:9999 Linux pgtest-03 pgtest-03 9999 9000 7 STANDBY
>
> Here's the pgpool.log from node3 after shutdown of pgpool on node2
> 2021-09-01 10:43:38: pid 417478: LOG: adding watchdog node "pgtest-01:9999 Linux pgtest-01" to the standby list
> 2021-09-01 10:43:38: pid 417478: LOG: quorum found
> 2021-09-01 10:43:38: pid 417478: DETAIL: starting escalation process
> 2021-09-01 10:43:38: pid 417478: LOG: escalation process started with PID:554759
> 2021-09-01 10:43:38: pid 417478: LOG: signal_user1_to_parent_with_reason(3)
> 2021-09-01 10:43:38: pid 417474: LOG: Pgpool-II parent process received SIGUSR1
> 2021-09-01 10:43:38: pid 417474: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
> 2021-09-01 10:43:38: pid 417478: LOG: new IPC connection received
> 2021-09-01 10:43:38: pid 417474: LOG: watchdog cluster now holds the quorum
> 2021-09-01 10:43:38: pid 417474: DETAIL: updating the state of quarantine backend nodes
> 2021-09-01 10:43:38: pid 417478: LOG: new IPC connection received
> 2021-09-01 10:43:38: pid 554759: LOG: watchdog: escalation started
> RTNETLINK answers: Operation not permitted
> 2021-09-01 10:43:38: pid 554759: LOG: failed to acquire the delegate IP address
> 2021-09-01 10:43:38: pid 554759: DETAIL: 'if_up_cmd' failed
> 2021-09-01 10:43:38: pid 554759: WARNING: watchdog escalation failed to acquire delegate IP
>
> Here's pcp_watchdog_info on node3 after showdown of pgpool on node2
> postgres at pgtest-03:~$ pcp_watchdog_info -h localhost -U wolf
> Password:
> 3 YES pgtest-03:9999 Linux pgtest-03 pgtest-03
>
> pgtest-03:9999 Linux pgtest-03 pgtest-03 9999 9000 4 LEADER
> pgtest-01:9999 Linux pgtest-01 pgtest-01 9999 9000 7 STANDBY
> pgtest-02:9999 Linux pgtest-02 pgtest-02 9999 9000 10 SHUTDOWN
>
> Here's pcp_watchdog_info on node3 after start of pgpool on node2
> postgres at pgtest-03:~$ pcp_watchdog_info -h localhost -U wolf
> Password:
> 3 YES pgtest-03:9999 Linux pgtest-03 pgtest-03
>
> pgtest-03:9999 Linux pgtest-03 pgtest-03 9999 9000 4 LEADER
> pgtest-01:9999 Linux pgtest-01 pgtest-01 9999 9000 7 STANDBY
> pgtest-02:9999 Linux pgtest-02 pgtest-02 9999 9000 7 STANDBY
>
> Still no watchdog IP enabled It seems this is the issue on node3 maybe a permission issue?
> RTNETLINK answers: Operation not permitted
>
> Wolf
>
> On 8/29/21, 9:18 PM, "Bo Peng" <pengbo at sraoss.co.jp> wrote:
>
> Hello,
>
> > Sorry but you miss the part where node 1 was added back to a standby after the failover to node 2. At the point of when I turn off pgpool on node 2, node 1 and node 3 are the standby nodes which node 3 should take over watchdog
>
> I have tested Pgpool-II 4.2.4, but I could not reproduce this issue.
> Could you share the following information?
>
> - result of "pcp_watchdog_info" after adding back node1 as a standby
> - pgpool logs of node 1 and node 3 after turning off pgpool on node2.
>
>
> > Wolf
> >
> > On 8/27/21, 10:07 AM, "Bo Peng" <pengbo at sraoss.co.jp> wrote:
> >
> > Hello,
> >
> > > My question is why watchdog doesn’t come up on node 3. Pgpool.conf is set the same on all 3 nodes.
> >
> > If you shut down pgpool node1 and node2, the number of alive pgpool is one,
> > the quorum does not exist.
> >
> > If you want to enable watchdog even if the quorum does not exist,
> > you need to enable the parameter "enable_consensus_with_half_votes".
> >
> > See more detail about "enable_consensus_with_half_votes":
> > https://www.pgpool.net/docs/latest/en/html/runtime-watchdog-config.html#GUC-ENABLE-CONSENSUS-WITH-HALF-VOTES
> >
> > > I have a 3 nodes setup for pgpool/postgresql using watchdog, When testing the failover of pgpool, I turn off pgpool on node 1 which fails over watchdog to node 2. Then I turn on pgpool on node 1 that set node 1 as a standby node. The next step I turn off pgpool on node 2 which watchdog try’s to fail over to node 3 but watchdog IP never comes up on node 3 or any of the nodes. So I turn off pgpool on node 3 and watchdog fails over to node 1.
> > > My question is why watchdog doesn’t come up on node 3. Pgpool.conf is set the same on all 3 nodes.
> > >
> > > Here’s my output of show pool_nodes
> > >
> > > node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
> > >
> > > ---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> > >
> > > 0 | pgtest-01 | 5432 | up | 0.500000 | primary | 2003 | true | 0 | | | 2021-08-24 14:06:20
> > >
> > > 1 | pgtest-02 | 5432 | up | 0.500000 | standby | 667 | false | 0 | streaming | async | 2021-08-24 14:06:20
> > >
> > > 2 | pgtest-03 | 5432 | up | 0.000000 | standby | 0 | false | 0 | streaming | async | 2021-08-24 14:06:20
> > >
> > > Not sure if this is an issue but the lb_weight show node 1(pgtest-01) and node 2(pgtest-02) as 0.5000 and node 3(pgtest-03) as 0.0000
> > >
> > > In pgpool.conf I have backend_weight for each node set to 0.3
> > >
> > > Hosts = Ubuntu 20.4
> > > Pgpool = 4.2.4
> > > PostgreSQL = 12.8
> > >
> > >
> > > -- Wolf
> > >
> > >
> >
> >
> > --
> > Bo Peng <pengbo at sraoss.co.jp>
> > SRA OSS, Inc. Japan
> > http://www.sraoss.co.jp/
> >
>
>
> --
> Bo Peng <pengbo at sraoss.co.jp>
> SRA OSS, Inc. Japan
> http://www.sraoss.co.jp/
>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
--
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan
http://www.sraoss.co.jp/
More information about the pgpool-general
mailing list