[pgpool-general: 9134] Error when disconnecting the network on postgres and pgpool standby node

Wed Jun 19 18:17:25 JST 2024

The postgres and pgpool configuration is the following one:
Node 0 is primary postgresql server (backend 0) and standby pgpool_node,
Node 1 is the standby postgresql (backend 1) and standby pgpool_node
Node 2 is LEADER of pgpool with delegate_IP UP and it does not have postgresql.
WatchDog is enabled with hearthbeat

Postgres works perfectly, reads, writes and works with pgpool connecting from port 9999.

Detail of the failure:
When the network of the Node 1 is TURNED DOWN (VMWARE disconnect network), the cluster pgpool delays connection to the database through the delegate_IP and port 9999. We have checked it and it takes 1-2 min to ask for the password.
psql -h delegate_IP -p 9999 -U prueba -d prueba
after 1-2 min prompt password

while the network is DOWN on node 1, direct access to node 0 postgresql through port 5432 is correct, the replication between the nodes becomes asynchronous and postgres operation is as expected, but access through IP delegated by port 9999 is still delayed

When the network is TURNED ON again, the pgpool service continues delaying connections until the service gets restarted on Node 2 (PGPOOL LEADER).

PGPOOL STATE NETWORK DOWN
3 3 YES nodo2.dominio.com:9999 Linux nodo2.dominio.com nodo2.dominio.com
nodo0.dominio.com:9999 Linux nodo0.dominio.com nodo0.dominio.com 9999 90000 7 STANDBY 0 MEMBER
nodo1.dominio.com:9999 Linux nodo1.dominio.com nodo1.dominio.com 9999 90000 8 LOST 0 MEMBER
nodo2.dominio.com:9999 Linux nodo2.dominio.com nodo2.dominio.com 9999 90000 4 LEADER 0 MEMBER

config pgpool
delegate_IP = '172.27.37.56'
if_cmd_path = '/sbin'
if_up_cmd = '/bin/sudo /sbin/ip addr add $_IP_$/24 dev ens192 label ens192:0'
if_down_cmd = '/bin/sudo /sbin/ip addr del $_IP_$/24 dev ens192'
arping_path = '/usr/sbin'
arping_cmd = '/bin/sudo /sbin/arping -U $_IP_$ -w 1 -I ens192'

load_balance_mode = off
connection_cache = on

backend_hostname0 = 'nodo0.dominio.com'
backend_port0 = 5432
backend_data_directory0 = '/postgresql/14/data/mdona-cpdm-pre-cluster07'
backend_weight0 = 1
backend_application_name0 = 'nodo0'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'nodo1.dominio.com'
backend_port1 = 5432
backend_data_directory1 = '/postgresql/14/data/mdona-cpdm-pre-cluster07'
backend_weight1 = 1
backend_application_name1 = 'nodo1'
backend_flag1 = 'ALLOW_TO_FAILOVER'

use_watchdog = on
wd_escalation_command = '/etc/pgpool-II/escalation.sh'
hostname0 = 'nodo0.dominio.com'
pgpool_port0 = 9999
wd_port0 = 9000
heartbeat_hostname0 = 'nodo0.dominio.com'
heartbeat_port0 = 9694
hostname1 = 'nodo1.dominio.com'
pgpool_port1 = 9999
wd_port1 = 9000
heartbeat_hostname1 = 'nodo1.dominio.com
heartbeat_port1 = 9694
hostname2 = 'nodo2.dominio.com'
pgpool_port2 = 9999
wd_port2 = 9000
heartbeat_hostname2 = 'nodo2.dominio.com'
heartbeat_port2 = 9694

process_management_mode = dynamic
process_management_strategy = gentle
min_spare_children = 10
max_spare_children = 20

health_check_period = 5
health_check_timeout = 20
health_check_user = 'pgpool2'
health_check_password = ''
health_check_database = ''
health_check_max_retries = 0
health_check_retry_delay = 1
connect_timeout = 10000

num_init_children = 500
max_pool = 2

child_life_time = 120
child_max_connections = 0
connection_life_time = 120
client_idle_limit = 0

backend_clustering_mode = 'streaming_replication'
listen_addresses = '*'
port = 9999
socket_dir = '/tmp'

Hernan Pezzani

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20240619/b943c285/attachment.htm>