[pgpool-general: 7738] Re: Fwd: watch_dog cluster down since "system has lost the network"
Bo Peng
pengbo at sraoss.co.jp
Mon Oct 4 12:45:33 JST 2021
Hello,
Sorry for the late response.
> Hi,
>
> I have two PG severs and three watch_dog nodes to setup a PG HA
> environment.
>
> - OS: Ubuntu 20.04
> - PG version:12.8
> - Pgpool version: 4.1.4
>
>
> - PG -primary: 192.168.1.122
> - PG -slave: 192.168.1.121
> - Watch_dog node0: 192.168.1.122
> - Watch_dog node1: 192.168.1.121
> - Watch_dog node2: 192.168.1.101
>
>
> the HA environment works fine while after 3-4 hours, two watch_dog nodes
> downs, remaining only 1 watch_dog node (192.168.1.101) running. the
> leader of watch_dog's log shows below error althought the network ip
> 192.168.1.122 is alive.
>
> 2021-09-20 15:53:37: pid 1900172: WARNING: network IP is removed and
> system has no IP is assigned
> 2021-09-20 15:53:37: pid 1900172: DETAIL: changing the state to in network
> trouble
> 2021-09-20 15:53:37: pid 1900172: DEBUG: removing all watchdog nodes from
> the standby list
I think it may be caused by a temporary network problem.
Does this issue occur every time?
> 2021-09-20 15:53:37: pid 1900172: DETAIL: standby list contains 1 nodes
> 2021-09-20 15:53:37: pid 1900172: DEBUG: Removing all failover objects
> 2021-09-20 15:53:37: pid 1900172: LOG: watchdog node state changed from
> [MASTER] to [IN NETWORK TROUBLE]
> 2021-09-20 15:53:37: pid 1900172: DEBUG: STATE MACHINE INVOKED WITH EVENT
> = STATE CHANGED Current State = IN NETWORK TROUBLE
> 2021-09-20 15:53:37: pid 1900172: FATAL: system has lost the network
> 2021-09-20 15:53:37: pid 1900172: LOG: Watchdog is shutting down
> 2021-09-20 15:53:37: pid 1900172: DEBUG: sending packet, watchdog node:[
> 192.168.1.101:9999 Linux dell-PowerEdge-R740] command id:[1113]
> type:[INFORM I AM GOING DOWN] state:[IN NETWORK TROUBLE]
> 2021-09-20 15:53:37: pid 1900172: DEBUG: sending watchdog packet to
> socket:8, type:[X], command ID:1113, data Length:0
> 2021-09-20 15:53:37: pid 1933141: LOG: watchdog: de-escalation started
> 2021-09-20 15:53:37: pid 1933141: DEBUG: watchdog exec interface up/down
> command: '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev ens2f0' succeeded
> 2021-09-20 15:53:37: pid 1933141: LOG: successfully released the delegate
> IP:"192.168.1.129"
> 2021-09-20 15:53:37: pid 1933141: DETAIL: 'if_down_cmd' returned with
> success
> 2021-09-20 15:53:37: pid 1900168: DEBUG: reaper handler
> 2021-09-20 15:53:37: pid 1900168: DEBUG: watchdog child process with pid:
> 1900172 exit with FATAL ERROR. pgpool-II will be shutdown
> 2021-09-20 15:53:37: pid 1900168: LOG: watchdog child process with pid:
> 1900172 exits with status 768
> 2021-09-20 15:53:37: pid 1900168: FATAL: watchdog child process exit with
> fatal error. exiting pgpool-II
> 2021-09-20 15:53:37: pid 1933148: LOG: setting the local watchdog node
> name to "192.168.1.122:9999 Linux dell-PowerEdge-R740"
> 2021-09-20 15:53:37: pid 1933148: LOG: watchdog cluster is configured with
> 2 remote nodes
> 2021-09-20 15:53:37: pid 1933148: LOG: watchdog remote node:0 on
> 192.168.1.121:9000
> 2021-09-20 15:53:37: pid 1933148: LOG: watchdog remote node:1 on
> 192.168.1.101:9000
> 2021-09-20 15:53:37: pid 1933148: LOG: interface monitoring is disabled in
> watchdog
> 2021-09-20 15:53:37: pid 1933148: INFO: IPC socket path:
> "/tmp/.s.PGPOOLWD_CMD.9000"
> 2021-09-20 15:53:37: pid 1933148: LOG: watchdog node state changed from
> [DEAD] to [LOADING]
> 2021-09-20 15:53:37: pid 1933148: DEBUG: STATE MACHINE INVOKED WITH EVENT
> = STATE CHANGED Current State = LOADING
> 2021-09-20 15:53:37: pid 1933148: DEBUG: error in outbound connection to
> 192.168.1.121:9000
> 2021-09-20 15:53:37: pid 1933148: DETAIL: Connection refused
> 2021-09-20 15:53:37: pid 1933148: LOG: new outbound connection to
> 192.168.1.101:9000
> 2021-09-20 15:53:37: pid 1900189: DEBUG: lifecheck child receives shutdown
> request signal 2, forwarding to all children
> 2021-09-20 15:53:37: pid 1900189: DEBUG: lifecheck child receives fast
> shutdown request
> 2021-09-20 15:53:37: pid 1933148: LOG: Watchdog is shutting down
>
> Please refer the pgpool.conf and running log on each server. Any advice
> to fix it?
--
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan
http://www.sraoss.co.jp/
More information about the pgpool-general
mailing list