[pgpool-hackers: 4270] Re: heartbeat behavior
Tatsuo Ishii
ishii at sraoss.co.jp
Sat Jan 21 20:46:41 JST 2023
Oops. I attached wrong patch.
Try again...
From: Tatsuo Ishii <ishii at sraoss.co.jp>
Subject: [pgpool-hackers: 4269] heartbeat behavior
Date: Sat, 21 Jan 2023 20:38:58 +0900 (JST)
Message-ID: <20230121.203858.1235299725568995991.t-ishii at sranhm.sra.co.jp>
> Hi Usama,
>
> I wanted to test heartbeat of Pgpool-II and created a test support
> tool, which allows to stop sending heartbeat packet to specified
> pgpool node. Attached is the patch to implement that.
>
> While testing heartbeat by using the tool, I found an interesting
> behavior of watchdog.
>
> 1) create 3 pgpool node cluster
>
> $ watchdog_setup -wn 3
>
> 2) start the cluster and waiting for lifekeeper process starting.
>
> 3) create "heartbeat_sender_control" file to prevent hearbeat sender
> on pgpool1 from sending heartbeart packet to pgpool2.
>
> $ echo 2 > pgpool1/log/heartbeat_sender_control
>
> 4) create "heartbeat_sender_control" file to prevent hearbeat sender
> on pgpool2 from sending heartbeart packet to pgpool1.
>
> $ echo 1 > pgpool2/log/heartbeat_sender_control
>
> At this point pgpool1 sends hearbeart to pgpool0 but does not send to
> pgpool2. Also pgpool2 sends hearbeart to pgpool0 but does not send to
> pgpool1.
>
> 5) wait until life_check reports node is in "NODE DEAD".
>
> Here is a pgpool log from pgpool1.
>
> 2023-01-21 20:14:13.541: life_check pid 598177: LOG: informing the node status change to watchdog
> 2023-01-21 20:14:13.541: life_check pid 598177: DETAIL: node id :2 status = "NODE DEAD" message:"No heartbeat signal from node"
> 2023-01-21 20:14:13.541: watchdog pid 598125: LOG: received node status change ipc message
> 2023-01-21 20:14:13.541: watchdog pid 598125: DETAIL: No heartbeat signal from node
> 2023-01-21 20:14:13.541: watchdog pid 598125: LOG: remote node "localhost:50008 Linux tishii-CFSV9-2" is lost
> 2023-01-21 20:14:13.542: watchdog pid 598125: LOG: new watchdog node connection is received from "127.0.0.1:44185"
> 2023-01-21 20:14:13.542: watchdog pid 598125: LOG: new outbound connection to localhost:50010
> 2023-01-21 20:14:13.542: watchdog pid 598125: LOG: new node joined the cluster hostname:"localhost" port:50010 pgpool_port:50008
> 2023-01-21 20:14:13.542: watchdog pid 598125: DETAIL: Pgpool-II version:"4.5devel" watchdog messaging version: 1.2
> 2023-01-21 20:14:13.542: watchdog pid 598125: LOG: The newly joined node:"localhost:50008 Linux tishii-CFSV9-2" had left the cluster because it was lost
> 2023-01-21 20:14:13.542: watchdog pid 598125: DETAIL: lost reason was "REPORTED BY LIFECHECK" and startup time diff = 0
>
> As I expected, "localhost:50008" (that is pgpool2) left the cluster. So far so good.
>
> Then strang thing happend:
>
> 2023-01-21 20:14:13.542: watchdog pid 598125: LOG: remote node "localhost:50008 Linux tishii-CFSV9-2" became reachable again
> 2023-01-21 20:14:13.542: watchdog pid 598125: DETAIL: requesting the node info
> 2023-01-21 20:14:13.542: watchdog pid 598125: LOG: remote node "localhost:50008 Linux tishii-CFSV9-2" is reporting that it has found us again
>
> Why pgpool2 came back despite that life check continues to report the
> node is dead? It seems the life check report has been ignored.
>
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_test.patch
Type: text/x-patch
Size: 4039 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20230121/a9d12975/attachment.bin>
More information about the pgpool-hackers
mailing list