[pgpool-general: 8766] Re: Fwd: Pgpool II/Watchdog HA configuration Question:
Bo Peng
pengbo at sraoss.co.jp
Mon May 15 18:06:55 JST 2023
Hi,
If you have 2 pgpool nodes and when one pgpool goes down,
pgpool lose quorum and stop bringing up the delegate IP.
If you want to ensure that delegate IP is available even if
only one pgpool is running, you need to set:
enable_consensus_with_half_votes = on
However, you should aware that you may run the risk of split brain
by enabling this parameter.
https://www.pgpool.net/docs/latest/en/html/runtime-watchdog-config.html#guc-enable-consensus-with-half-votes
On Sun, 14 May 2023 17:46:54 -0400
KiSh USA <coffeewithkish at gmail.com> wrote:
> Hello Team,
>
>
>
> My PGPOOL II 2 node configuration works perfectly when both nodes are up, I
> can ping to my Delegate IP also connect clients through Delegate IP etc..
>
>
>
> But when I shutdown Master PGPOOL II node, and it fails over to STANDBY
> NODE (which is now New Master) am unable to ping/connect using delegate IP
>
> From remote client, I can ping locally in the new Master and connect, but
> not from clients,
>
> Basically, as soon as I stop master node, the delegated IP stops
> responding. As a result, databases are unavailable.
>
>
>
> Bcoz of this issue am unable to implement HA for PGPOOL II nodes, can you
> please advise…
>
>
>
> Thanks in advance.
>
>
>
> NOTES:
>
>
> PGPOOL II NODES:
>
>
>
> rn000110724 - 10.50.28.58 - MASTER NODE:
>
> rn000110733 – 10.201.36.72 – STANDBY NODE
>
> Delegate IP : 10.50.28.80
>
>
>
> sh-4.4# pcp_watchdog_info -h 10.50.28.58 -p 9898 -U pgpcp -v
>
> Password:
>
> Watchdog Cluster Information
>
> Total Nodes : 2
>
> Remote Nodes : 1
>
> Quorum state : QUORUM EXIST
>
> Alive Remote Nodes : 1
>
> VIP up on local node : YES
>
> Master Node Name : rn000110724:9999 Linux rn000110724
>
> Master Host Name : rn000110724
>
>
>
> Watchdog Node Information
>
> Node Name : rn000110724:9999 Linux rn000110724
>
> Host Name : rn000110724
>
> Delegate IP : 10.50.28.80
>
> Pgpool port : 9999
>
> Watchdog port : 9000
>
> Node priority : 0
>
> Status : 4
>
> Status Name : MASTER
>
>
>
> Node Name : rn000110733:9999 Linux rn000110733
>
> Host Name : rn000110733
>
> Delegate IP : 10.50.28.80
>
> Pgpool port : 9999
>
> Watchdog port : 9000
>
> Node priority : 0
>
> Status : 7
>
> Status Name : STANDBY
>
>
>
>
>
>
>
> sh-4.4# pcp_watchdog_info -h 10.201.36.72 -p 9898 -U pgpcp -v
>
> Password:
>
> Watchdog Cluster Information
>
> Total Nodes : 2
>
> Remote Nodes : 1
>
> Quorum state : QUORUM EXIST
>
> Alive Remote Nodes : 1
>
> VIP up on local node : NO
>
> Master Node Name : rn000110724:9999 Linux rn000110724
>
> Master Host Name : rn000110724
>
>
>
> Watchdog Node Information
>
> Node Name : rn000110733:9999 Linux rn000110733
>
> Host Name : rn000110733
>
> Delegate IP : 10.50.28.80
>
> Pgpool port : 9999
>
> Watchdog port : 9000
>
> Node priority : 0
>
> Status : 7
>
> Status Name : STANDBY
>
>
>
> Node Name : rn000110724:9999 Linux rn000110724
>
> Host Name : rn000110724
>
> Delegate IP : 10.50.28.80
>
> Pgpool port : 9999
>
> Watchdog port : 9000
>
> Node priority : 0
>
> Status : 4
>
> Status Name : MASTER
>
>
>
>
>
> rn000110724 - 10.50.28.58 - Master node:
>
> sh-4.4# ifconfig
>
> eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
>
> inet 10.50.28.58 netmask 255.255.252.0 broadcast 10.50.31.255
>
> ether 00:50:56:a8:27:eb txqueuelen 1000 (Ethernet)
>
> RX packets 768864 bytes 120999407 (115.3 MiB)
>
> RX errors 0 dropped 0 overruns 0 frame 0
>
> TX packets 343573 bytes 123312342 (117.5 MiB)
>
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
>
>
> eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
>
> inet 10.50.28.80 netmask 255.255.255.0 broadcast 0.0.0.0
>
> ether 00:50:56:a8:27:eb txqueuelen 1000 (Ethernet)
>
>
>
> lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
>
>
>
>
>
>
>
>
>
> Can ping/connect using Delegate IP from Remote host:
>
>
>
> /usr/bin/psql -h 10.50.28.80 -p 9999 -d postgres -U pgpool
>
> Password for user pgpool:
>
> psql (14.2)
>
> postgres=# \l
>
> List of databases
>
> Name | Owner | Encoding | Collate | Ctype | Access
> privileges
>
> ---------------+----------+----------+---------+-------+----------------------------
>
> postgres | postgres | UTF8 | C | C |
>
>
>
>
>
>
>
> postgres=# show pool_nodes ;
>
> node_id | hostname | port | status | lb_weight | role | select_cnt |
> load_balance_node | replication_delay | replication_state |
> replication_sync_s
>
> tate | last_status_change
>
> ---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+-------------------
>
> -----+---------------------
>
> 0 | rn000098071 | 5432 | up | 0.000000 | primary | 1 |
> true | 0 |
> |
>
> | 2023-05-14 10:31:43
>
> 1 | rn000098069 | 5432 | up | 1.000000 | standby | 0 |
> false | 0 |
> |
>
> | 2023-05-14 10:31:43
>
> (2 rows)
>
>
>
>
>
> postgres at rn000098071:/var/lib/pgsql
>
> $ ping 10.50.28.80
>
> PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.
>
> 64 bytes from 10.50.28.80: icmp_seq=1 ttl=55 time=0.728 ms
>
> 64 bytes from 10.50.28.80: icmp_seq=2 ttl=55 time=0.490 ms
>
> 64 bytes from 10.50.28.80: icmp_seq=3 ttl=55 time=0.376 ms
>
> 64 bytes from 10.50.28.80: icmp_seq=4 ttl=55 time=0.477 ms
>
>
>
>
>
>
>
> cat pgpool.conf
>
>
>
> #------------------------------------------------------------------------------
>
> # WATCHDOG
>
> #------------------------------------------------------------------------------
>
>
>
> # - Enabling -
>
>
>
> use_watchdog = *on * # Activates watchdog
>
> # (change requires restart)
>
>
>
> # -Connection to up stream servers -
>
>
>
> trusted_servers = ''
>
> # trusted server list which are used
>
> # to confirm network connection
>
> # (hostA,hostB,hostC,...)
>
> # (change requires restart)
>
> ping_path = '/bin'
>
> # ping command path
>
> # (change requires restart)
>
>
>
> # - Watchdog communication Settings -
>
>
>
> wd_hostname = '*rn000110724*'
>
> # Host name or IP address of this
> watchdog
>
> # (change requires restart)
>
> wd_port = *9000*
>
> # port number for watchdog service
>
> # (change requires restart)
>
> wd_priority = *0*
>
> # priority of this watchdog in leader
> election
>
> # (change requires restart)
>
>
>
> wd_authkey = ''
>
> # Authentication key for watchdog
> communication
>
> # (change requires restart)
>
>
>
> wd_ipc_socket_dir = '/var/run/postgresql'
>
> # Unix domain socket path for watchdog
> IPC socket
>
> # The Debian package defaults to
>
> # /var/run/postgresql
>
> # (change requires restart)
>
>
>
>
>
> # - Virtual IP control Setting -
>
>
>
> delegate_IP = *'10.50.28.80*'
>
> # delegate IP address
>
> # If this is empty, virtual IP never
> bring up.
>
> # (change requires restart)
>
> if_cmd_path = '/sbin'
>
> # path to the directory where
> if_up/down_cmd exists
>
> # If if_up/down_cmd starts with "/",
> if_cmd_path will be ignored.
>
> # (change requires restart)
>
> if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label
> eth0:0'
>
> # startup delegate IP command
>
> # (change requires restart)
>
> if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
>
> # shutdown delegate IP command
>
> # (change requires restart)
>
> arping_path = '/usr/sbin'
>
> # arping command path
>
> # If arping_cmd starts with "/",
> if_cmd_path will be ignored.
>
> # (change requires restart)
>
> arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
>
> # arping command
>
> # (change requires restart)
>
> ifconfig_path = '/etc/pgpool-II'
>
>
>
>
>
>
>
>
>
>
>
> # - Behaivor on escalation Setting -
>
>
>
> clear_memqcache_on_escalation = *on*
>
> # Clear all the query cache on shared
> memory
>
> # when standby pgpool escalate to
> active pgpool
>
> # (= virtual IP holder).
>
> # This should be off if client connects
> to pgpool
>
> # not using virtual IP.
>
> # (change requires restart)
>
> wd_escalation_command = '/etc/pgpool-II/escalation.sh'
>
> # Executes this command at escalation
> on new active pgpool.
>
> # (change requires restart)
>
> wd_de_escalation_command = ''
>
> # Executes this command when master
> pgpool resigns from being master.
>
> # (change requires restart)
>
>
>
> # - Watchdog consensus settings for failover -
>
>
>
> failover_when_quorum_exists = on
>
> # Only perform backend node failover
>
> # when the watchdog cluster holds the
> quorum
>
> # (change requires restart)
>
>
>
> failover_require_consensus = on
>
> # Perform failover when majority of
> Pgpool-II nodes
>
> # aggrees on the backend node status
> change
>
> # (change requires restart)
>
>
>
> allow_multiple_failover_requests_from_node = off
>
> # A Pgpool-II node can cast multiple
> votes
>
> # for building the consensus on failover
>
> # (change requires restart)
>
>
>
> #enable_consensus_with_half_votes = off
>
> enable_consensus_with_half_votes = on
>
> # apply majority rule for consensus and
> quorum computation
>
> # at 50% of votes in a cluster with
> even number of nodes.
>
> # when enabled the existence of quorum
> and consensus
>
> # on failover is resolved after
> receiving half of the
>
> # total votes in the cluster, otherwise
> both these
>
> # decisions require at least one more
> vote than
>
> # half of the total votes.
>
> # (change requires restart)
>
>
>
> # - Lifecheck Setting -
>
>
>
> # -- common --
>
>
>
> wd_monitoring_interfaces_list = '' # Comma separated list of interfaces
> names to monitor.
>
> # if any interface from the list is
> active the watchdog will
>
> # consider the network is fine
>
> # 'any' to enable monitoring on all
> interfaces except loopback
>
> # '' to disable monitoring
>
> # (change requires restart)
>
>
>
>
>
> wd_lifecheck_method = 'heartbeat'
>
> # Method of watchdog lifecheck
> ('heartbeat' or 'query' or 'external')
>
> # (change requires restart)
>
> wd_interval = 10
>
> # lifecheck interval (sec) > 0
>
> # (change requires restart)
>
>
>
> # -- heartbeat mode --
>
>
>
> wd_heartbeat_port = 9694
>
> # Port number for receiving heartbeat
> signal
>
> # (change requires restart)
>
> wd_heartbeat_keepalive = 2
>
> # Interval time of sending heartbeat
> signal (sec)
>
> # (change requires restart)
>
> wd_heartbeat_deadtime = 30
>
> # Deadtime interval for heartbeat
> signal (sec)
>
> # (change requires restart)
>
> #heartbeat_destination0 = 'host0_ip1'
>
> #heartbeat_destination0 = '10.201.36.72'# Host name or IP address of
> destination 0
>
> heartbeat_destination0 = '10.50.28.58' # for sending heartbeat signal.
>
> # (change requires restart)
>
> heartbeat_destination_port0 = 9694
>
> # Port number of destination 0 for
> sending
>
> # heartbeat signal. Usually this is the
>
> # same as wd_heartbeat_port.
>
> # (change requires restart)
>
> heartbeat_device0 = ''
>
> # Name of NIC device (such like 'eth0')
>
> # used for sending/receiving heartbeat
>
> # signal to/from destination 0.
>
> # This works only when this is not empty
>
> # and pgpool has root privilege.
>
> # (change requires restart)
>
>
>
>
>
> heartbeat_destination1 = *'10.201.36.72*'
>
> heartbeat_destination_port1 = 9694
>
> #heartbeat_device1 = ''
>
>
>
> # -- query mode --
>
>
>
> wd_life_point = 3
>
> # lifecheck retry times
>
> # (change requires restart)
>
> wd_lifecheck_query = 'SELECT 1'
>
> # lifecheck query to pgpool from
> watchdog
>
> # (change requires restart)
>
> wd_lifecheck_dbname = 'template1'
>
> # Database name connected for lifecheck
>
> # (change requires restart)
>
> wd_lifecheck_user = 'nobody'
>
> # watchdog user monitoring pgpools in
> lifecheck
>
> # (change requires restart)
>
> wd_lifecheck_password = ''
>
> # Password for watchdog user in
> lifecheck
>
> # Leaving it empty will make Pgpool-II
> to first look for the
>
> # Password in pool_passwd file before
> using the empty password
>
> # (change requires restart)
>
>
>
> # - Other pgpool Connection Settings -
>
> other_pgpool_hostname0 = '*rn000110733*'
>
> other_pgpool_port0 = 9999 # Port number for other pgpool 0
>
> # (change requires restart)
>
> other_wd_port0 = 9000
>
> # Port number for other watchdog 0
>
> # (change requires restart)
>
>
>
> #other_pgpool_hostname1 = 'host1'
>
> #other_pgpool_port1 = 5432
>
> #other_wd_port1 = 9000
>
> #other_wd_port0 = 9000
>
>
>
>
>
>
>
> PART II:
>
>
>
> PGPOOL II NODES:
>
>
>
> rn000110724 - 10.50.28.58 - MASTER NODE:
>
> rn000110733 – 10.201.36.72 – STANDBY NODE
>
> Delegate IP : 10.50.28.80
>
>
>
>
>
> *Shutdown PGPOOL II on current Master : *
>
> rn000110724 - 10.50.28.58 - MASTER NODE:
>
>
>
> LOG
>
> 2023-05-14 16:12:44: pid 79354: LOG: Watchdog is shutting down
>
> 2023-05-14 16:12:44: pid 113361: LOG: watchdog: de-escalation started
>
> 2023-05-14 16:12:44: pid 113361: LOG: successfully released the delegate
> IP:"10.50.28.80"
>
> 2023-05-14 16:12:44: pid 113361: DETAIL: 'if_down_cmd' returned with
> success
>
>
>
>
>
> rn000110733 – 10.201.36.72
>
> New Master LOG
>
>
>
> 2023-05-14 16:12:44: pid 88197: LOG: remote node "rn000110724:9999 Linux
> rn000110724" is shutting down
>
> 2023-05-14 16:12:44: pid 88197: LOG: watchdog cluster has lost the
> coordinator node
>
> 2023-05-14 16:12:44: pid 88197: LOG: removing the remote node
> "rn000110724:9999 Linux rn000110724" from watchdog cluster master
>
> 2023-05-14 16:12:44: pid 88197: LOG: We have lost the cluster master node
> "rn000110724:9999 Linux rn000110724"
>
> 2023-05-14 16:12:44: pid 88197: LOG: watchdog node state changed from
> [STANDBY] to [JOINING]
>
> 2023-05-14 16:12:48: pid 88197: LOG: watchdog node state changed from
> [JOINING] to [INITIALIZING]
>
> 2023-05-14 16:12:49: pid 88197: LOG: I am the only alive node in the
> watchdog cluster
>
> 2023-05-14 16:12:49: pid 88197: HINT: skipping stand for coordinator state
>
> 2023-05-14 16:12:49: pid 88197: LOG: watchdog node state changed from
> [INITIALIZING] to [MASTER]
>
> 2023-05-14 16:12:49: pid 88197: LOG: I am announcing my self as
> master/coordinator watchdog node
>
> 2023-05-14 16:12:53: pid 88197: LOG: I am the cluster leader node
>
> 2023-05-14 16:12:57: pid 118646: LOG*: successfully acquired the delegate
> IP:"10.50.28.80"*
>
> 2023-05-14 16:13:22: pid 88197: LOG: remote node "rn000110724:9999 Linux
> rn000110724" is shutting down
>
> 2023-05-14 16:13:32: pid 88197: LOG: new IPC connection received
>
>
>
>
>
> rn000110733 – 10.201.36.72 -- Acquired delegate IP
>
>
>
> sh-4.4# ifconfig
>
> eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
>
> inet 10.201.36.72 netmask 255.255.252.0 broadcast 10.201.39.255
>
> ether 00:50:56:9c:12:9d txqueuelen 1000 (Ethernet)
>
> RX packets 652871 bytes 115519655 (110.1 MiB)
>
> RX errors 0 dropped 505 overruns 0 frame 0
>
> TX packets 286480 bytes 125140983 (119.3 MiB)
>
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
>
>
> eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
>
> inet *10.50.28.80 netmask 255.255.255.0 broadcast 0.0.0.0*
>
> ether 00:50:56:9c:12:9d txqueuelen 1000 (Ethernet)
>
>
>
>
>
>
>
> rn000110733 – 10.201.36.72 - New Master:
>
>
>
> sh-4.4# pcp_watchdog_info -h 10.201.36.72 -p 9898 -U pgpcp -v
>
> Password:
>
> Watchdog Cluster Information
>
> Total Nodes : 2
>
> Remote Nodes : 1
>
> Quorum state : QUORUM IS ON THE EDGE
>
> Alive Remote Nodes : 0
>
> VIP up on local node : YES
>
> Master Node Name : rn000110733:9999 Linux rn000110733
>
> Master Host Name : rn000110733
>
>
>
> Watchdog Node Information
>
> Node Name : rn000110733:9999 Linux rn000110733
>
> Host Name : rn000110733
>
> Delegate IP : 10.50.28.80
>
> Pgpool port : 9999
>
> Watchdog port : 9000
>
> Node priority : 0
>
> Status : 4
>
> Status Name : MASTER
>
>
>
> Node Name : rn000110724:9999 Linux rn000110724
>
> Host Name : rn000110724
>
> Delegate IP : 10.50.28.80
>
> Pgpool port : 9999
>
> Watchdog port : 9000
>
> Node priority : 0
>
> Status : 10
>
> Status Name : SHUTDOWN
>
>
>
>
>
>
>
> Can ping and connect *locally *from new Master PGPOOL Node:
>
>
>
>
>
> sh-4.4# ping 10.50.28.80
>
> PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.
>
> 64 bytes from 10.50.28.80: icmp_seq=1 ttl=64 time=0.023 ms
>
> 64 bytes from 10.50.28.80: icmp_seq=2 ttl=64 time=0.021 ms
>
>
>
>
>
> sh-4.4# /usr/bin/psql -h *10.50.28.80* -p 9999 -d postgres -U pgpool
>
> Password for user pgpool:
>
> postgres=#
>
>
>
>
>
> But from Client host unable to ping:
>
>
>
> $ /usr/bin/psql -h 10.50.28.80 -p 9999 -d postgres -U pgpool
>
> psql: error: connection to server at "10.50.28.80", port 9999 failed: No
> route to host
>
> Is the server running on that host and accepting TCP/IP connections?
>
>
>
>
>
> postgres at rn000098071:/var/lib/pgsql
>
> $ ping 10.50.28.80
>
> PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.
>
> From 10.50.28.1 icmp_seq=1 Destination Host Unreachable
>
> From 10.50.28.1 icmp_seq=3 Destination Host Unreachable
>
> From 10.50.28.1 icmp_seq=2 Destination Host Unreachable
>
>
>
>
>
>
>
>
>
>
>
>
>
> *Thank you!*
>
> Kishore
--
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS LLC
https://www.sraoss.co.jp/
More information about the pgpool-general
mailing list