[pgpool-hackers: 4509] Re: Watchdog and IPv6

Kwangwon Seo keiseo at protonmail.com
Mon Aug 19 14:18:29 JST 2024


Thanks for all your considerate reviewing and attention. (really appreciate it)

I had started fixing it just after receiving the first mail. 

I also want to reply with actual patch files with some progress so that was the reason of big delay in responding to the mail. sorry about that.

anyway, about some progress.

-- following advices.
indentation, %m, compile warning, and so on..
Yes, I checked again so as try to not to make a mistake again...

-- about regression tests
028.watchdog_enable_consensus_with_half_votes
yes. timeout.. your tip is exactly the point. so I could fix that.

036.trusted_servers...
yes. It's covered now.

004, 011, 012, 013, 014, 015 is covered both localhost and ::1.

-- some stories (you can ignore this)
Recently, I reinstalled my devices and just installed ssh library without actual executable binary.
regress 28 executes pgpool_recovery() and inside the script, starting with the ssh command.
That was one of the reasons for spending some time.

And the other reason was, I assigned the wrong port number. what a silly me.
It was just one line. but I spent too much time trying to find the reason for it. C APIs and Memorycontext mechanism. that's far away from the actual reason.

-- about attachment (patch files)
whitespaces checked and created with -v option. This is the 4th submission.

v4-0001-feature-ipv6-for-watchdog.patch
Actual patch file. some detailed messages for additional supplementary inside. It's just for an explanation so you can erase that message.

v4-0002-test-localhost-to-1.patch
It's just for test purposes. replace address from 'localhost' to '::1' on the regression tests. You can ignore this patch file.


well, anyway, It's not finished until actually finished.

So how about this patch? you can say anything at ease..

---

Regards
Kwangwon Seo

On Thursday, August 8th, 2024 at 10:35 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> > > I think you should give a version number to your patch using "-v"
> > > option of git format-patch command.
> > >
> > > I have quick looked through the patch.
> > >
> > > t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch
> > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343: trailing whitespace.
> > >
> > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355: trailing whitespace.
> > >
> > > warning: 2 lines add whitespace errors.
> > >
> > > Please remove unnecessary white spaces.
> > >
> > > Also I got compiler warnings:
> > >
> > > In file included from watchdog.c:51:
> > > watchdog.c: In function 。watchdog_main「:
> > > ../../src/include/utils/elog.h:199:4: warning: 。port「 may be used uninitialized in this function [-Wmaybe-uninitialized]
> > > 199 | errfinish rest; \
> > > | ^~~~~~~~~
> > > watchdog.c:3439:8: note: 。port「 was declared here
> > > 3439 | int port;
> > > | ^~~~
> > >
> > > This is caused by following code snipet:
> > >
> > > + switch(ss.ss_family)
> > > + {
> > > + case AF_INET:
> > > + inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr, conn->addr, addrlen);
> > > + port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);
> > > + break;
> > > +
> > > + case AF_INET6:
> > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
> > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
> > > + break;
> > > + }
> > > +
> > > + ereport(LOG,
> > > + (errmsg("new watchdog node connection is received from \"%s:%d\"",conn->addr, port)));
> > > +
> > >
> > > Here the switch statement does not have "default" and if ss.ss_family
> > > is not either AF_INET or AF_INET6, the value of "port" will be
> > > undefined. That's why the compiler complains. I think you should have
> > > "default" something like:
> > >
> > > + case AF_INET6:
> > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
> > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
> > > + break;
> > > +
> > > default:
> > > ereport(ERROR,
> > > (errmsg("unknown ss_family")));
> > > break;
> > >
> > > BTW the indentations after CASE should be added. See other CASE
> > > statements in pgpool.
> > >
> > > + else if (sel_ret == -1)
> > > + {
> > > + ereport(ERROR,
> > > + (errmsg("failed to get socket data from socket list")
> > > + ,errdetail("select() failed with errno %d", errno)));
> > >
> > > You can use %m" here without using errno. See codes using %m in pgpool.
> > >
> > > + }
> > > + else
> > > + {
> > > + /* unreachable. */
> > > + Assert(FALSE);
> > > + }
> > >
> > > Assert cannot be used in pgpool for now.
> > >
> > > Regarding the regression test, I got a timeout error with
> > > 028.watchdog_enable_consensus_with_half_votes test. Have you
> > > succeeded?
> >
> > In addition, 036.trusted_servers also timed out here.
>
>
> I am looking into this a little bit. The time out was caused by below
> in the test script:
>
> function wait_for_watchdog_startup
> {
> while :
> do
> grep "lifecheck started" $log >/dev/null
>
> if [ $? = 0 ];then
> break;
> fi
> sleep 1
> done
> }
>
> It looks for "lifecheck started" in the pgpool log. The message is
> supposed to be issued by lifecheck_main().
>
> /* wait until ready to go */
> while (WD_OK != is_wd_lifecheck_ready())
> {
> sleep(pool_config->wd_interval * 10);
>
> }
>
> ereport(LOG,
> (errmsg("watchdog: lifecheck started")));
>
> Problem is, is_wd_lifecheck_ready() does not return WD_OK. In is_wd_lifecheck_ready(),
>
> ereport(DEBUG1,
> (errmsg("watchdog checking life check is ready"),
> errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal yet",
> i, node->hostName, node->pgpoolPort)));
>
>
> So I set log_min_messages to debug1 and got following in the log file.
>
> 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
> 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:1 at "localhost:50004" has not send the heartbeat signal yet
> 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
> 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:2 at "localhost:50008" has not send the heartbeat signal yet
>
> It seems other watchdog's lifecheck process do not send the hearbeat
> signal, or hearbeat receiver could not received the signal?
>
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v4-0002-test-localhost-to-1.patch
Type: text/x-patch
Size: 15548 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240819/0e7a6a36/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v4-0001-feature-ipv6-for-watchdog.patch
Type: text/x-patch
Size: 30784 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240819/0e7a6a36/attachment-0003.bin>


More information about the pgpool-hackers mailing list