[pgpool-general: 8763] Fwd: Pgpool II/Watchdog HA configuration Question:

KiSh USA coffeewithkish at gmail.com
Mon May 15 06:46:54 JST 2023


Hello Team,



My PGPOOL II 2 node configuration works perfectly when both nodes are up, I
can ping to my Delegate IP also connect clients through Delegate IP etc..



But when I shutdown Master PGPOOL II node, and it fails over to STANDBY
NODE (which is now New Master) am unable to ping/connect using delegate IP

>From remote client, I can ping locally in the new Master and connect, but
not from clients,

Basically, as soon as I stop master node, the delegated IP stops
responding. As a result, databases are unavailable.



Bcoz of this issue am unable to implement HA for PGPOOL II nodes,  can you
please advise…



 Thanks in advance.



NOTES:


PGPOOL II NODES:



rn000110724 - 10.50.28.58 - MASTER NODE:

rn000110733 – 10.201.36.72 – STANDBY NODE

Delegate IP : 10.50.28.80



sh-4.4# pcp_watchdog_info -h 10.50.28.58 -p 9898 -U pgpcp -v

Password:

Watchdog Cluster Information

Total Nodes          : 2

Remote Nodes         : 1

Quorum state         : QUORUM EXIST

Alive Remote Nodes   : 1

VIP up on local node : YES

Master Node Name     : rn000110724:9999 Linux rn000110724

Master Host Name     : rn000110724



Watchdog Node Information

Node Name      : rn000110724:9999 Linux rn000110724

Host Name      : rn000110724

Delegate IP    : 10.50.28.80

Pgpool port    : 9999

Watchdog port : 9000

Node priority : 0

Status         : 4

Status Name    : MASTER



Node Name      : rn000110733:9999 Linux rn000110733

Host Name      : rn000110733

Delegate IP    : 10.50.28.80

Pgpool port    : 9999

Watchdog port : 9000

Node priority : 0

Status         : 7

Status Name    : STANDBY







sh-4.4# pcp_watchdog_info -h 10.201.36.72 -p 9898 -U pgpcp -v

Password:

Watchdog Cluster Information

Total Nodes          : 2

Remote Nodes         : 1

Quorum state         : QUORUM EXIST

Alive Remote Nodes   : 1

VIP up on local node : NO

Master Node Name     : rn000110724:9999 Linux rn000110724

Master Host Name     : rn000110724



Watchdog Node Information

Node Name      : rn000110733:9999 Linux rn000110733

Host Name      : rn000110733

Delegate IP    : 10.50.28.80

Pgpool port    : 9999

Watchdog port : 9000

Node priority : 0

Status         : 7

Status Name    : STANDBY



Node Name      : rn000110724:9999 Linux rn000110724

Host Name      : rn000110724

Delegate IP    : 10.50.28.80

Pgpool port    : 9999

Watchdog port : 9000

Node priority : 0

Status         : 4

Status Name    : MASTER





rn000110724 - 10.50.28.58  - Master node:

sh-4.4# ifconfig

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

        inet 10.50.28.58  netmask 255.255.252.0  broadcast 10.50.31.255

        ether 00:50:56:a8:27:eb  txqueuelen 1000  (Ethernet)

        RX packets 768864  bytes 120999407 (115.3 MiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 343573  bytes 123312342 (117.5 MiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

        inet 10.50.28.80  netmask 255.255.255.0  broadcast 0.0.0.0

        ether 00:50:56:a8:27:eb  txqueuelen 1000  (Ethernet)



lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536









Can ping/connect using  Delegate IP from Remote host:



/usr/bin/psql -h 10.50.28.80  -p 9999 -d postgres -U pgpool

Password for user pgpool:

psql (14.2)

postgres=# \l

                                 List of databases

     Name      |  Owner   | Encoding | Collate | Ctype |     Access
privileges

---------------+----------+----------+---------+-------+----------------------------

postgres      | postgres | UTF8     | C       | C     |







postgres=# show pool_nodes ;

node_id |  hostname   | port | status | lb_weight |  role   | select_cnt |
load_balance_node | replication_delay | replication_state |
replication_sync_s

tate | last_status_change

---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+-------------------

-----+---------------------

0       | rn000098071 | 5432 | up     | 0.000000  | primary | 1          |
true              | 0                 |
|

     | 2023-05-14 10:31:43

1       | rn000098069 | 5432 | up     | 1.000000  | standby | 0          |
false             | 0                 |
|

     | 2023-05-14 10:31:43

(2 rows)





postgres at rn000098071:/var/lib/pgsql

$ ping 10.50.28.80

PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.

64 bytes from 10.50.28.80: icmp_seq=1 ttl=55 time=0.728 ms

64 bytes from 10.50.28.80: icmp_seq=2 ttl=55 time=0.490 ms

64 bytes from 10.50.28.80: icmp_seq=3 ttl=55 time=0.376 ms

64 bytes from 10.50.28.80: icmp_seq=4 ttl=55 time=0.477 ms







cat pgpool.conf



#------------------------------------------------------------------------------

# WATCHDOG

#------------------------------------------------------------------------------



# - Enabling -



use_watchdog = *on *                  # Activates watchdog

                                    # (change requires restart)



# -Connection to up stream servers -



trusted_servers = ''

                                    # trusted server list which are used

                                    # to confirm network connection

                                    # (hostA,hostB,hostC,...)

                                    # (change requires restart)

ping_path = '/bin'

                                    # ping command path

                                    # (change requires restart)



# - Watchdog communication Settings -



wd_hostname = '*rn000110724*'

                                    # Host name or IP address of this
watchdog

                                    # (change requires restart)

wd_port = *9000*

                                    # port number for watchdog service

                                    # (change requires restart)

wd_priority = *0*

                                    # priority of this watchdog in leader
election

                                    # (change requires restart)



wd_authkey = ''

                                    # Authentication key for watchdog
communication

                                    # (change requires restart)



wd_ipc_socket_dir = '/var/run/postgresql'

                                    # Unix domain socket path for watchdog
IPC socket

                                    # The Debian package defaults to

                                    # /var/run/postgresql

                                    # (change requires restart)





# - Virtual IP control Setting -



delegate_IP = *'10.50.28.80*'

                                    # delegate IP address

                                    # If this is empty, virtual IP never
bring up.

                                    # (change requires restart)

if_cmd_path = '/sbin'

                                    # path to the directory where
if_up/down_cmd exists

                                    # If if_up/down_cmd starts with "/",
if_cmd_path will be ignored.

                                    # (change requires restart)

if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label
eth0:0'

                                    # startup delegate IP command

                                    # (change requires restart)

if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'

                                    # shutdown delegate IP command

                                    # (change requires restart)

arping_path = '/usr/sbin'

                                    # arping command path

                                    # If arping_cmd starts with "/",
if_cmd_path will be ignored.

                                    # (change requires restart)

arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'

                                    # arping command

                                    # (change requires restart)

ifconfig_path = '/etc/pgpool-II'











# - Behaivor on escalation Setting -



clear_memqcache_on_escalation = *on*

                                    # Clear all the query cache on shared
memory

                                    # when standby pgpool escalate to
active pgpool

                                    # (= virtual IP holder).

                                    # This should be off if client connects
to pgpool

                                    # not using virtual IP.

                                    # (change requires restart)

wd_escalation_command = '/etc/pgpool-II/escalation.sh'

                                    # Executes this command at escalation
on new active pgpool.

                                    # (change requires restart)

wd_de_escalation_command = ''

                                    # Executes this command when master
pgpool resigns from being master.

                                    # (change requires restart)



# - Watchdog consensus settings for failover -



failover_when_quorum_exists = on

                                    # Only perform backend node failover

                                    # when the watchdog cluster holds the
quorum

                                    # (change requires restart)



failover_require_consensus = on

                                    # Perform failover when majority of
Pgpool-II nodes

                                    # aggrees on the backend node status
change

                                    # (change requires restart)



allow_multiple_failover_requests_from_node = off

                                    # A Pgpool-II node can cast multiple
votes

                                    # for building the consensus on failover

                                    # (change requires restart)



#enable_consensus_with_half_votes = off

enable_consensus_with_half_votes = on

                                    # apply majority rule for consensus and
quorum computation

                                    # at 50% of votes in a cluster with
even number of nodes.

                                    # when enabled the existence of quorum
and consensus

                                    # on failover is resolved after
receiving half of the

                                    # total votes in the cluster, otherwise
both these

                                    # decisions require at least one more
vote than

                                    # half of the total votes.

                                    # (change requires restart)



# - Lifecheck Setting -



# -- common --



wd_monitoring_interfaces_list = ''  # Comma separated list of interfaces
names to monitor.

                                    # if any interface from the list is
active the watchdog will

                                    # consider the network is fine

                                    # 'any' to enable monitoring on all
interfaces except loopback

                                    # '' to disable monitoring

                                    # (change requires restart)





wd_lifecheck_method = 'heartbeat'

                                    # Method of watchdog lifecheck
('heartbeat' or 'query' or 'external')

                                    # (change requires restart)

wd_interval = 10

                                    # lifecheck interval (sec) > 0

                                    # (change requires restart)



# -- heartbeat mode --



wd_heartbeat_port = 9694

                                    # Port number for receiving heartbeat
signal

                                    # (change requires restart)

wd_heartbeat_keepalive = 2

                                    # Interval time of sending heartbeat
signal (sec)

                                    # (change requires restart)

wd_heartbeat_deadtime = 30

                                    # Deadtime interval for heartbeat
signal (sec)

                                    # (change requires restart)

#heartbeat_destination0 = 'host0_ip1'

#heartbeat_destination0 = '10.201.36.72'# Host name or IP address of
destination 0

heartbeat_destination0 = '10.50.28.58' # for sending heartbeat signal.

                                    # (change requires restart)

heartbeat_destination_port0 = 9694

                                    # Port number of destination 0 for
sending

                                    # heartbeat signal. Usually this is the

                                    # same as wd_heartbeat_port.

                                    # (change requires restart)

heartbeat_device0 = ''

                                    # Name of NIC device (such like 'eth0')

                                    # used for sending/receiving heartbeat

                                    # signal to/from destination 0.

                                    # This works only when this is not empty

                                    # and pgpool has root privilege.

                                    # (change requires restart)





heartbeat_destination1 = *'10.201.36.72*'

heartbeat_destination_port1 = 9694

#heartbeat_device1 = ''



# -- query mode --



wd_life_point = 3

                                    # lifecheck retry times

                                    # (change requires restart)

wd_lifecheck_query = 'SELECT 1'

                                    # lifecheck query to pgpool from
watchdog

                                    # (change requires restart)

wd_lifecheck_dbname = 'template1'

                                    # Database name connected for lifecheck

                                    # (change requires restart)

wd_lifecheck_user = 'nobody'

                                    # watchdog user monitoring pgpools in
lifecheck

                                    # (change requires restart)

wd_lifecheck_password = ''

                                    # Password for watchdog user in
lifecheck

                                    # Leaving it empty will make Pgpool-II
to first look for the

                                    # Password in pool_passwd file before
using the empty password

                                    # (change requires restart)



# - Other pgpool Connection Settings -

other_pgpool_hostname0 = '*rn000110733*'

other_pgpool_port0 = 9999          # Port number for other pgpool 0

                                    # (change requires restart)

other_wd_port0 = 9000

                                    # Port number for other watchdog 0

                                    # (change requires restart)



#other_pgpool_hostname1 = 'host1'

#other_pgpool_port1 = 5432

#other_wd_port1 = 9000

#other_wd_port0 = 9000







PART II:



PGPOOL II NODES:



rn000110724 - 10.50.28.58 - MASTER NODE:

rn000110733 – 10.201.36.72 – STANDBY NODE

Delegate IP : 10.50.28.80





*Shutdown PGPOOL II on current Master : *

rn000110724 - 10.50.28.58 - MASTER NODE:



LOG

2023-05-14 16:12:44: pid 79354: LOG:  Watchdog is shutting down

2023-05-14 16:12:44: pid 113361: LOG:  watchdog: de-escalation started

2023-05-14 16:12:44: pid 113361: LOG:  successfully released the delegate
IP:"10.50.28.80"

2023-05-14 16:12:44: pid 113361: DETAIL:  'if_down_cmd' returned with
success





rn000110733 – 10.201.36.72

New Master LOG



2023-05-14 16:12:44: pid 88197: LOG:  remote node "rn000110724:9999 Linux
rn000110724" is shutting down

2023-05-14 16:12:44: pid 88197: LOG:  watchdog cluster has lost the
coordinator node

2023-05-14 16:12:44: pid 88197: LOG:  removing the remote node
"rn000110724:9999 Linux rn000110724" from watchdog cluster master

2023-05-14 16:12:44: pid 88197: LOG:  We have lost the cluster master node
"rn000110724:9999 Linux rn000110724"

2023-05-14 16:12:44: pid 88197: LOG:  watchdog node state changed from
[STANDBY] to [JOINING]

2023-05-14 16:12:48: pid 88197: LOG:  watchdog node state changed from
[JOINING] to [INITIALIZING]

2023-05-14 16:12:49: pid 88197: LOG:  I am the only alive node in the
watchdog cluster

2023-05-14 16:12:49: pid 88197: HINT:  skipping stand for coordinator state

2023-05-14 16:12:49: pid 88197: LOG:  watchdog node state changed from
[INITIALIZING] to [MASTER]

2023-05-14 16:12:49: pid 88197: LOG:  I am announcing my self as
master/coordinator watchdog node

2023-05-14 16:12:53: pid 88197: LOG:  I am the cluster leader node

2023-05-14 16:12:57: pid 118646: LOG*:  successfully acquired the delegate
IP:"10.50.28.80"*

2023-05-14 16:13:22: pid 88197: LOG:  remote node "rn000110724:9999 Linux
rn000110724" is shutting down

2023-05-14 16:13:32: pid 88197: LOG:  new IPC connection received





rn000110733 – 10.201.36.72  -- Acquired delegate IP



sh-4.4# ifconfig

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

        inet 10.201.36.72  netmask 255.255.252.0  broadcast 10.201.39.255

        ether 00:50:56:9c:12:9d  txqueuelen 1000  (Ethernet)

        RX packets 652871  bytes 115519655 (110.1 MiB)

        RX errors 0  dropped 505  overruns 0  frame 0

        TX packets 286480  bytes 125140983 (119.3 MiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

        inet *10.50.28.80  netmask 255.255.255.0  broadcast 0.0.0.0*

        ether 00:50:56:9c:12:9d  txqueuelen 1000  (Ethernet)







rn000110733 – 10.201.36.72  - New Master:



sh-4.4# pcp_watchdog_info -h 10.201.36.72 -p 9898 -U pgpcp -v

Password:

Watchdog Cluster Information

Total Nodes          : 2

Remote Nodes         : 1

Quorum state         : QUORUM IS ON THE EDGE

Alive Remote Nodes   : 0

VIP up on local node : YES

Master Node Name     : rn000110733:9999 Linux rn000110733

Master Host Name     : rn000110733



Watchdog Node Information

Node Name      : rn000110733:9999 Linux rn000110733

Host Name      : rn000110733

Delegate IP    : 10.50.28.80

Pgpool port    : 9999

Watchdog port  : 9000

Node priority  : 0

Status         : 4

Status Name    : MASTER



Node Name      : rn000110724:9999 Linux rn000110724

Host Name      : rn000110724

Delegate IP    : 10.50.28.80

Pgpool port    : 9999

Watchdog port  : 9000

Node priority  : 0

Status         : 10

Status Name    : SHUTDOWN







Can ping and connect *locally *from new Master PGPOOL Node:





sh-4.4# ping 10.50.28.80

PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.

64 bytes from 10.50.28.80: icmp_seq=1 ttl=64 time=0.023 ms

64 bytes from 10.50.28.80: icmp_seq=2 ttl=64 time=0.021 ms





sh-4.4# /usr/bin/psql -h *10.50.28.80*  -p 9999 -d postgres -U pgpool

Password for user pgpool:

postgres=#





But from Client host unable to ping:



$  /usr/bin/psql -h 10.50.28.80  -p 9999 -d postgres -U pgpool

psql: error: connection to server at "10.50.28.80", port 9999 failed: No
route to host

        Is the server running on that host and accepting TCP/IP connections?





postgres at rn000098071:/var/lib/pgsql

$ ping 10.50.28.80

PING 10.50.28.80 (10.50.28.80) 56(84) bytes of data.

>From 10.50.28.1 icmp_seq=1 Destination Host Unreachable

>From 10.50.28.1 icmp_seq=3 Destination Host Unreachable

>From 10.50.28.1 icmp_seq=2 Destination Host Unreachable













*Thank you!*

Kishore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20230514/8f2518e6/attachment.htm>


More information about the pgpool-general mailing list