[pgpool-general: 7425] Problems with watchdog
Francesco Mazzi
fmazzi at comune.genova.it
Mon Mar 1 21:50:22 JST 2021
Hello, I'm testing watchdog with 2 instances of pgpool 4.2.2 , package pgpool-II-pg12 from pgpool repository, centos 7.
I have problems in starting pgpool, in both nodes, even if I start only one, it starts and it becomes leader but then it dies. I see some errors in log, here is the complete log of one node.
systemd: Started Pgpool-II.
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: health_check_stats_shared_memory_size: requested size: 12288
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: memory cache initialized
pgpool: 2021-03-01 11:54:14: pid 1868: DETAIL: memcache blocks :64
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: allocating (136555376) bytes of shared memory segment
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: allocating shared memory segment of size: 136555376
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: health_check_stats_shared_memory_size: requested size: 12288
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: health_check_stats_shared_memory_size: requested size: 12288
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: memory cache initialized
pgpool: 2021-03-01 11:54:14: pid 1868: DETAIL: memcache blocks :64
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: pool_discard_oid_maps: discarded memqcache oid maps
pgpool: 2021-03-01 11:54:14: pid 1868: LOG: waiting for watchdog to initialize
pgpool: 2021-03-01 11:54:14: pid 1873: LOG: setting the local watchdog node name to "vm-lxpgproxy1:5432 Linux vm-lxpgproxy1"
pgpool: 2021-03-01 11:54:14: pid 1873: LOG: watchdog cluster is configured with 1 remote nodes
pgpool: 2021-03-01 11:54:14: pid 1873: LOG: watchdog remote node:0 on vm-lxpgproxy2:9000
pgpool: 2021-03-01 11:54:14: pid 1873: LOG: watchdog node state changed from [DEAD] to [LOADING]
pgpool: 2021-03-01 11:54:19: pid 1873: LOG: watchdog node state changed from [LOADING] to [JOINING]
pgpool: 2021-03-01 11:54:23: pid 1873: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
pgpool: 2021-03-01 11:54:24: pid 1873: LOG: I am the only alive node in the watchdog cluster
pgpool: 2021-03-01 11:54:24: pid 1873: HINT: skipping stand for coordinator state
pgpool: 2021-03-01 11:54:24: pid 1873: LOG: watchdog node state changed from [INITIALIZING] to [LEADER]
pgpool: 2021-03-01 11:54:24: pid 1873: LOG: I am announcing my self as leader/coordinator watchdog node
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: I am the cluster leader node
pgpool: 2021-03-01 11:54:28: pid 1873: DETAIL: our declare coordinator message is accepted by all nodes
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: setting the local node "vm-lxpgproxy1:5432 Linux vm-lxpgproxy1" as watchdog cluste
r leader
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: I am the cluster leader node. Starting escalation process
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: escalation process started with PID:1888
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: watchdog process is initialized
pgpool: 2021-03-01 11:54:28: pid 1868: DETAIL: watchdog messaging data version: 1.2
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: new IPC connection received
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: authentication failed
pgpool: 2021-03-01 11:54:28: pid 1873: DETAIL: IPC command contains an invalid data
pgpool: 2021-03-01 11:54:28: pid 1868: WARNING: get runtime variable value from watchdog failed
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: failed to get current state of local watchdog node
pgpool: 2021-03-01 11:54:28: pid 1868: DETAIL: get runtime variable value from watchdog returned no data
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: Setting up socket for 0.0.0.0:5432
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: Setting up socket for :::5432
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: find_primary_node_repeatedly: waiting for finding a primary node
pgpool: 2021-03-01 11:54:28: pid 1888: LOG: watchdog: escalation started
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: new IPC connection received
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: authentication failed
pgpool: 2021-03-01 11:54:28: pid 1873: DETAIL: IPC command contains an invalid data
pgpool: 2021-03-01 11:54:28: pid 1889: FATAL: get node list command reply contains no data
pgpool: + PGPOOLS=(vm-lxpgproxy2)
pgpool: + VIP=192.168.153.230
pgpool: + DEVICE=ens32
pgpool: + for pgpool in '"${PGPOOLS[@]}"'
pgpool: + '[' vm-lxpgproxy1 = vm-lxpgproxy2 ']'
pgpool: + ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres at vm-lxpgproxy2 -i /var/lib/pgsql/.ssh/id_rsa '
pgpool: /usr/bin/sudo /sbin/ip addr del 192.168.153.230/22 dev ens32
pgpool: '
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: find_primary_node: standby node is 0
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: find_primary_node: primary node is 1
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: find_primary_node: standby node is 2
pgpool: 2021-03-01 11:54:28: pid 1928: LOG: process started
pgpool: 2021-03-01 11:54:28: pid 1925: LOG: process started
pgpool: 2021-03-01 11:54:28: pid 1926: LOG: process started
pgpool: 2021-03-01 11:54:28: pid 1927: LOG: process started
pgpool: 2021-03-01 11:54:28: pid 1924: LOG: PCP process: 1924 started
pgpool: Warning: Permanently added 'vm-lxpgproxy2,192.168.153.237' (ECDSA) to the list of known hosts.
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: pgpool-II successfully started. version 4.2.2 (chichiriboshi)
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: node status[0]: 2
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: node status[1]: 1
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: node status[2]: 2
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: watchdog lifecheck process with pid: 1889 exits with status 768
pgpool: 2021-03-01 11:54:28: pid 1868: FATAL: watchdog lifecheck process exit with fatal error. exiting pgpool-II
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: new IPC connection received
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: authentication failed
pgpool: 2021-03-01 11:54:28: pid 1873: DETAIL: IPC command contains an invalid data
pgpool: 2021-03-01 11:54:28: pid 1929: ERROR: get node list command reply contains no data
pgpool: 2021-03-01 11:54:28: pid 1868: LOG: shutting down
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: Watchdog is shutting down
pgpool: 2021-03-01 11:54:28: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation
pgpool: RTNETLINK answers: Cannot assign requested address
pgpool: + exit 0
pgpool: 2021-03-01 11:54:28: pid 1888: LOG: watchdog escalation successful
pgpool: 2021-03-01 11:54:29: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation
pgpool: 2021-03-01 11:54:30: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation
pgpool: 2021-03-01 11:54:31: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation
pgpool: 2021-03-01 11:54:32: pid 1873: LOG: waiting for escalation process to exit before starting de-escalation
pgpool: 2021-03-01 11:54:33: pid 1888: LOG: successfully acquired the delegate IP:"192.168.153.230"
pgpool: 2021-03-01 11:54:33: pid 1888: DETAIL: 'if_up_cmd' returned with success
pgpool: 2021-03-01 11:54:33: pid 1873: LOG: escalation process does not exited in time
pgpool: 2021-03-01 11:54:33: pid 1873: DETAIL: starting the de-escalation anyway
pgpool: 2021-03-01 11:54:33: pid 1942: LOG: watchdog: de-escalation started
pgpool: 2021-03-01 11:54:33: pid 1942: LOG: successfully released the delegate IP:"192.168.153.230"
pgpool: 2021-03-01 11:54:33: pid 1942: DETAIL: 'if_down_cmd' returned with success
systemd: pgpool.service: main process exited, code=exited, status=1/FAILURE
systemd: Unit pgpool.service entered failed state.
systemd: pgpool.service failed.
Any ideas?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20210301/6924bae7/attachment.htm>
More information about the pgpool-general
mailing list