[pgpool-general: 8763] Fwd: Pgpool II/Watchdog HA configuration Question:
KiSh USA
coffeewithkish at gmail.com
Mon May 15 06:46:54 JST 2023
Hello Team,
My PGPOOL II 2 node configuration works perfectly when both nodes are up, I
can ping to my Delegate IP also connect clients through Delegate IP etc..
But when I shutdown Master PGPOOL II node, and it fails over to STANDBY
NODE (which is now New Master) am unable to ping/connect using delegate IP
>From remote client, I can ping locally in the new Master and connect, but
not from clients,
Basically, as soon as I stop master node, the delegated IP stops
responding. As a result, databases are unavailable.
Bcoz of this issue am unable to implement HA for PGPOOL II nodes, can you
please advise…
Thanks in advance.
NOTES:
PGPOOL II NODES:
rn000110724 - 10.50.28.58 - MASTER NODE:
rn000110733 – 10.201.36.72 – STANDBY NODE
Delegate IP : 10.50.28.80
sh-4.4# pcp_watchdog_info -h 10.50.28.58 -p 9898 -U pgpcp -v
Password:
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : YES
Master Node Name : rn000110724:9999 Linux rn000110724
Master Host Name : rn000110724
Watchdog Node Information
Node Name : rn000110724:9999 Linux rn000110724
Host Name : rn000110724
Delegate IP : 10.50.28.80
Pgpool port : 9999
Watchdog port : 9000
Node priority : 0
Status : 4
Status Name : MASTER
Node Name : rn000110733:9999 Linux rn000110733
Host Name : rn000110733
Delegate IP : 10.50.28.80
Pgpool port : 9999
Watchdog port : 9000
Node priority : 0
Status : 7
Status Name : STANDBY
sh-4.4# pcp_watchdog_info -h 10.201.36.72 -p 9898 -U pgpcp -v
Password:
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM EXIST
Alive Remote Nodes : 1
VIP up on local node : NO
Master Node Name : rn000110724:9999 Linux rn000110724
Master Host Name : rn000110724
Watchdog Node Information
Node Name : rn000110733:9999 Linux rn000110733
Host Name : rn000110733
Delegate IP : 10.50.28.80
Pgpool port : 9999
Watchdog port : 9000
Node priority : 0
Status : 7
Status Name : STANDBY
Node Name : rn000110724:9999 Linux rn000110724
Host Name : rn000110724
Delegate IP : 10.50.28.80
Pgpool port : 9999
Watchdog port : 9000
Node priority : 0
Status : 4
Status Name : MASTER
rn000110724 - 10.50.28.58 - Master node:
sh-4.4# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.50.28.58 netmask 255.255.252.0 broadcast 10.50.31.255
ether 00:50:56:a8:27:eb txqueuelen 1000 (Ethernet)
RX packets 768864 bytes 120999407 (115.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 343573 bytes 123312342 (117.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.50.28.80 netmask 255.255.255.0 broadcast 0.0.0.0
ether 00:50:56:a8:27:eb txqueuelen 1000 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
Can ping/connect using Delegate IP from Remote host:
/usr/bin/psql -h 10.50.28.80 -p 9999 -d postgres -U pgpool
Password for user pgpool:
psql (14.2)
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access
privileges
---------------+----------+----------+---------+-------+----------------------------
postgres | postgres | UTF8 | C | C |
postgres=# show pool_nodes ;
node_id | hostname | port | status | lb_weight | role | select_cnt |
load_balance_node | replication_delay | replication_state |
replication_sync_s
tate | last_status_change
---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+-------------------
-----+---------------------
0 | rn000098071 | 5432 | up | 0.000000 | primary | 1 |
true | 0 |
|
| 2023-05-14 10:31:43
1 | rn000098069 | 5432 | up | 1.000000 | standby | 0 |
false | 0 |
|
| 2023-05-14 10:31:43
(2 rows)
postgres at rn000098071:/var/lib/pgsql
$ ping 10.50.28.80
PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.
64 bytes from 10.50.28.80: icmp_seq=1 ttl=55 time=0.728 ms
64 bytes from 10.50.28.80: icmp_seq=2 ttl=55 time=0.490 ms
64 bytes from 10.50.28.80: icmp_seq=3 ttl=55 time=0.376 ms
64 bytes from 10.50.28.80: icmp_seq=4 ttl=55 time=0.477 ms
cat pgpool.conf
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
# - Enabling -
use_watchdog = *on * # Activates watchdog
# (change requires restart)
# -Connection to up stream servers -
trusted_servers = ''
# trusted server list which are used
# to confirm network connection
# (hostA,hostB,hostC,...)
# (change requires restart)
ping_path = '/bin'
# ping command path
# (change requires restart)
# - Watchdog communication Settings -
wd_hostname = '*rn000110724*'
# Host name or IP address of this
watchdog
# (change requires restart)
wd_port = *9000*
# port number for watchdog service
# (change requires restart)
wd_priority = *0*
# priority of this watchdog in leader
election
# (change requires restart)
wd_authkey = ''
# Authentication key for watchdog
communication
# (change requires restart)
wd_ipc_socket_dir = '/var/run/postgresql'
# Unix domain socket path for watchdog
IPC socket
# The Debian package defaults to
# /var/run/postgresql
# (change requires restart)
# - Virtual IP control Setting -
delegate_IP = *'10.50.28.80*'
# delegate IP address
# If this is empty, virtual IP never
bring up.
# (change requires restart)
if_cmd_path = '/sbin'
# path to the directory where
if_up/down_cmd exists
# If if_up/down_cmd starts with "/",
if_cmd_path will be ignored.
# (change requires restart)
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label
eth0:0'
# startup delegate IP command
# (change requires restart)
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
# shutdown delegate IP command
# (change requires restart)
arping_path = '/usr/sbin'
# arping command path
# If arping_cmd starts with "/",
if_cmd_path will be ignored.
# (change requires restart)
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
# arping command
# (change requires restart)
ifconfig_path = '/etc/pgpool-II'
# - Behaivor on escalation Setting -
clear_memqcache_on_escalation = *on*
# Clear all the query cache on shared
memory
# when standby pgpool escalate to
active pgpool
# (= virtual IP holder).
# This should be off if client connects
to pgpool
# not using virtual IP.
# (change requires restart)
wd_escalation_command = '/etc/pgpool-II/escalation.sh'
# Executes this command at escalation
on new active pgpool.
# (change requires restart)
wd_de_escalation_command = ''
# Executes this command when master
pgpool resigns from being master.
# (change requires restart)
# - Watchdog consensus settings for failover -
failover_when_quorum_exists = on
# Only perform backend node failover
# when the watchdog cluster holds the
quorum
# (change requires restart)
failover_require_consensus = on
# Perform failover when majority of
Pgpool-II nodes
# aggrees on the backend node status
change
# (change requires restart)
allow_multiple_failover_requests_from_node = off
# A Pgpool-II node can cast multiple
votes
# for building the consensus on failover
# (change requires restart)
#enable_consensus_with_half_votes = off
enable_consensus_with_half_votes = on
# apply majority rule for consensus and
quorum computation
# at 50% of votes in a cluster with
even number of nodes.
# when enabled the existence of quorum
and consensus
# on failover is resolved after
receiving half of the
# total votes in the cluster, otherwise
both these
# decisions require at least one more
vote than
# half of the total votes.
# (change requires restart)
# - Lifecheck Setting -
# -- common --
wd_monitoring_interfaces_list = '' # Comma separated list of interfaces
names to monitor.
# if any interface from the list is
active the watchdog will
# consider the network is fine
# 'any' to enable monitoring on all
interfaces except loopback
# '' to disable monitoring
# (change requires restart)
wd_lifecheck_method = 'heartbeat'
# Method of watchdog lifecheck
('heartbeat' or 'query' or 'external')
# (change requires restart)
wd_interval = 10
# lifecheck interval (sec) > 0
# (change requires restart)
# -- heartbeat mode --
wd_heartbeat_port = 9694
# Port number for receiving heartbeat
signal
# (change requires restart)
wd_heartbeat_keepalive = 2
# Interval time of sending heartbeat
signal (sec)
# (change requires restart)
wd_heartbeat_deadtime = 30
# Deadtime interval for heartbeat
signal (sec)
# (change requires restart)
#heartbeat_destination0 = 'host0_ip1'
#heartbeat_destination0 = '10.201.36.72'# Host name or IP address of
destination 0
heartbeat_destination0 = '10.50.28.58' # for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for
sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = ''
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
heartbeat_destination1 = *'10.201.36.72*'
heartbeat_destination_port1 = 9694
#heartbeat_device1 = ''
# -- query mode --
wd_life_point = 3
# lifecheck retry times
# (change requires restart)
wd_lifecheck_query = 'SELECT 1'
# lifecheck query to pgpool from
watchdog
# (change requires restart)
wd_lifecheck_dbname = 'template1'
# Database name connected for lifecheck
# (change requires restart)
wd_lifecheck_user = 'nobody'
# watchdog user monitoring pgpools in
lifecheck
# (change requires restart)
wd_lifecheck_password = ''
# Password for watchdog user in
lifecheck
# Leaving it empty will make Pgpool-II
to first look for the
# Password in pool_passwd file before
using the empty password
# (change requires restart)
# - Other pgpool Connection Settings -
other_pgpool_hostname0 = '*rn000110733*'
other_pgpool_port0 = 9999 # Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
#other_pgpool_hostname1 = 'host1'
#other_pgpool_port1 = 5432
#other_wd_port1 = 9000
#other_wd_port0 = 9000
PART II:
PGPOOL II NODES:
rn000110724 - 10.50.28.58 - MASTER NODE:
rn000110733 – 10.201.36.72 – STANDBY NODE
Delegate IP : 10.50.28.80
*Shutdown PGPOOL II on current Master : *
rn000110724 - 10.50.28.58 - MASTER NODE:
LOG
2023-05-14 16:12:44: pid 79354: LOG: Watchdog is shutting down
2023-05-14 16:12:44: pid 113361: LOG: watchdog: de-escalation started
2023-05-14 16:12:44: pid 113361: LOG: successfully released the delegate
IP:"10.50.28.80"
2023-05-14 16:12:44: pid 113361: DETAIL: 'if_down_cmd' returned with
success
rn000110733 – 10.201.36.72
New Master LOG
2023-05-14 16:12:44: pid 88197: LOG: remote node "rn000110724:9999 Linux
rn000110724" is shutting down
2023-05-14 16:12:44: pid 88197: LOG: watchdog cluster has lost the
coordinator node
2023-05-14 16:12:44: pid 88197: LOG: removing the remote node
"rn000110724:9999 Linux rn000110724" from watchdog cluster master
2023-05-14 16:12:44: pid 88197: LOG: We have lost the cluster master node
"rn000110724:9999 Linux rn000110724"
2023-05-14 16:12:44: pid 88197: LOG: watchdog node state changed from
[STANDBY] to [JOINING]
2023-05-14 16:12:48: pid 88197: LOG: watchdog node state changed from
[JOINING] to [INITIALIZING]
2023-05-14 16:12:49: pid 88197: LOG: I am the only alive node in the
watchdog cluster
2023-05-14 16:12:49: pid 88197: HINT: skipping stand for coordinator state
2023-05-14 16:12:49: pid 88197: LOG: watchdog node state changed from
[INITIALIZING] to [MASTER]
2023-05-14 16:12:49: pid 88197: LOG: I am announcing my self as
master/coordinator watchdog node
2023-05-14 16:12:53: pid 88197: LOG: I am the cluster leader node
2023-05-14 16:12:57: pid 118646: LOG*: successfully acquired the delegate
IP:"10.50.28.80"*
2023-05-14 16:13:22: pid 88197: LOG: remote node "rn000110724:9999 Linux
rn000110724" is shutting down
2023-05-14 16:13:32: pid 88197: LOG: new IPC connection received
rn000110733 – 10.201.36.72 -- Acquired delegate IP
sh-4.4# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.201.36.72 netmask 255.255.252.0 broadcast 10.201.39.255
ether 00:50:56:9c:12:9d txqueuelen 1000 (Ethernet)
RX packets 652871 bytes 115519655 (110.1 MiB)
RX errors 0 dropped 505 overruns 0 frame 0
TX packets 286480 bytes 125140983 (119.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet *10.50.28.80 netmask 255.255.255.0 broadcast 0.0.0.0*
ether 00:50:56:9c:12:9d txqueuelen 1000 (Ethernet)
rn000110733 – 10.201.36.72 - New Master:
sh-4.4# pcp_watchdog_info -h 10.201.36.72 -p 9898 -U pgpcp -v
Password:
Watchdog Cluster Information
Total Nodes : 2
Remote Nodes : 1
Quorum state : QUORUM IS ON THE EDGE
Alive Remote Nodes : 0
VIP up on local node : YES
Master Node Name : rn000110733:9999 Linux rn000110733
Master Host Name : rn000110733
Watchdog Node Information
Node Name : rn000110733:9999 Linux rn000110733
Host Name : rn000110733
Delegate IP : 10.50.28.80
Pgpool port : 9999
Watchdog port : 9000
Node priority : 0
Status : 4
Status Name : MASTER
Node Name : rn000110724:9999 Linux rn000110724
Host Name : rn000110724
Delegate IP : 10.50.28.80
Pgpool port : 9999
Watchdog port : 9000
Node priority : 0
Status : 10
Status Name : SHUTDOWN
Can ping and connect *locally *from new Master PGPOOL Node:
sh-4.4# ping 10.50.28.80
PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.
64 bytes from 10.50.28.80: icmp_seq=1 ttl=64 time=0.023 ms
64 bytes from 10.50.28.80: icmp_seq=2 ttl=64 time=0.021 ms
sh-4.4# /usr/bin/psql -h *10.50.28.80* -p 9999 -d postgres -U pgpool
Password for user pgpool:
postgres=#
But from Client host unable to ping:
$ /usr/bin/psql -h 10.50.28.80 -p 9999 -d postgres -U pgpool
psql: error: connection to server at "10.50.28.80", port 9999 failed: No
route to host
Is the server running on that host and accepting TCP/IP connections?
postgres at rn000098071:/var/lib/pgsql
$ ping 10.50.28.80
PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.
>From 10.50.28.1 icmp_seq=1 Destination Host Unreachable
>From 10.50.28.1 icmp_seq=3 Destination Host Unreachable
>From 10.50.28.1 icmp_seq=2 Destination Host Unreachable
*Thank you!*
Kishore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20230514/8f2518e6/attachment.htm>
More information about the pgpool-general
mailing list