[pgpool-hackers: 4512] Re: Watchdog and IPv6

Tatsuo Ishii ishii at postgresql.org
Fri Aug 23 15:25:44 JST 2024


Hi Kwangwon,

Thank you for the quick update. I think your patch is getting closer
to commitable state.

Usama,

As you are the authority of watchdog, I would like to ask you to
review the patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

From: Kwangwon Seo <keiseo at protonmail.com>
Subject: Re: [pgpool-hackers: 4502] Re: Watchdog and IPv6
Date: Fri, 23 Aug 2024 02:27:16 +0000
Message-ID: <NZs60H0XR7JUEi50CZG8W3ijdSb5KrTn3lryKQFC-5UASWAiCMW6v5_hYU99B9Op4SkDYRsuraulgOrGa8fIlPpZNaqREPr-iqxjcozx7YE=@protonmail.com>

> ohhh
> 
> What a good news! glad to hear that!!
> 
> details below.
> 
> 
> -- conventions
> 
> Yes. following the coding convention is an expression of politeness.
> 
> and.. postgresql put a comma at the end and pgpool also does the same.
> 
> 
> -- about development guidance
> 
> LINK: https://www.pgpool.net/mediawiki/index.php/Development 
> 
> according to this development guidance, It looks like.. SGML documentation is also needed for the committable patch.
> 
> 
> -- patch attachment
> 
> Just in case, I submit the 5th patch which only changes the comma location(v5-0001).
> 
> And the other patch(v5-0002) is about documentation updates.
> 
> but I'm not sure about whether this is accptable or not.
> 
> If those things are not fit the situation, you can ignore this.
> 
> 
> 
> 
> Regards
> Kwangwon Seo
> 
> Bitnine Global Inc.
> www.bitnine.net
> 
> On Thursday, August 22nd, 2024 at 5:13 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> 
>> Hi Kwangwon,
>> 
>> Thank you for the updated patches.
>> 
>> > Thanks for all your considerate reviewing and attention. (really appreciate it)
>> > 
>> > I had started fixing it just after receiving the first mail.
>> > 
>> > I also want to reply with actual patch files with some progress so that was the reason of big delay in responding to the mail. sorry about that.
>> > 
>> > anyway, about some progress.
>> > 
>> > -- following advices.
>> > indentation, %m, compile warning, and so on..
>> > Yes, I checked again so as try to not to make a mistake again...
>> 
>> > -- about regression tests
>> > 028.watchdog_enable_consensus_with_half_votes
>> > yes. timeout.. your tip is exactly the point. so I could fix that.
>> 
>> 
>> Great. I have succeeded all regression tests including 028 this time.
>> 
>> > 036.trusted_servers...
>> > yes. It's covered now.
>> 
>> 
>> This suceeded too.
>> 
>> > 004, 011, 012, 013, 014, 015 is covered both localhost and ::1.
>> > 
>> > -- some stories (you can ignore this)
>> > Recently, I reinstalled my devices and just installed ssh library without actual executable binary.
>> > regress 28 executes pgpool_recovery() and inside the script, starting with the ssh command.
>> > That was one of the reasons for spending some time.
>> > 
>> > And the other reason was, I assigned the wrong port number. what a silly me.
>> > It was just one line. but I spent too much time trying to find the reason for it. C APIs and Memorycontext mechanism. that's far away from the actual reason.
>> > 
>> > -- about attachment (patch files)
>> > whitespaces checked and created with -v option. This is the 4th submission.
>> > 
>> > v4-0001-feature-ipv6-for-watchdog.patch
>> > Actual patch file. some detailed messages for additional supplementary inside. It's just for an explanation so you can erase that message.
>> > 
>> > v4-0002-test-localhost-to-1.patch
>> > It's just for test purposes. replace address from 'localhost' to '::1' on the regression tests. You can ignore this patch file.
>> 
>> 
>> I have tested the patch. It worked well here.
>> 
>> > well, anyway, It's not finished until actually finished.
>> > 
>> > So how about this patch? you can say anything at ease..
>> 
>> 
>> I forgot to say in the last email. It's very subtle point...
>> 
>> + ereport(ERROR,
>> + (errmsg("failed to get socket data from heartbeat receive socket list")
>> + ,errdetail("select() failed with reason %m")));
>> + }
>> + else
>> + {
>> + ereport(ERROR,
>> + (errmsg("failed to get socket data from heartbeat receive socket list")
>> + ,errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
>> 
>> We usually place "," at the end of the line, not the beginning of the
>> line. I mean following style is better.
>> 
>> + (errmsg("failed to get socket data from heartbeat receive socket list"),
>> + errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
>> 
>> > ---
>> > 
>> > Regards
>> > Kwangwon Seo
>> > 
>> > On Thursday, August 8th, 2024 at 10:35 AM, Tatsuo Ishii ishii at postgresql.org wrote:
>> > 
>> > > > > I think you should give a version number to your patch using "-v"
>> > > > > option of git format-patch command.
>> > > > > 
>> > > > > I have quick looked through the patch.
>> > > > > 
>> > > > > t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch
>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343: trailing whitespace.
>> > > > > 
>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355: trailing whitespace.
>> > > > > 
>> > > > > warning: 2 lines add whitespace errors.
>> > > > > 
>> > > > > Please remove unnecessary white spaces.
>> > > > > 
>> > > > > Also I got compiler warnings:
>> > > > > 
>> > > > > In file included from watchdog.c:51:
>> > > > > watchdog.c: In function 。watchdog_main「:
>> > > > > ../../src/include/utils/elog.h:199:4: warning: 。port「 may be used uninitialized in this function [-Wmaybe-uninitialized]
>> > > > > 199 | errfinish rest; \
>> > > > > | ^~~~~~~~~
>> > > > > watchdog.c:3439:8: note: 。port「 was declared here
>> > > > > 3439 | int port;
>> > > > > | ^~~~
>> > > > > 
>> > > > > This is caused by following code snipet:
>> > > > > 
>> > > > > + switch(ss.ss_family)
>> > > > > + {
>> > > > > + case AF_INET:
>> > > > > + inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr, conn->addr, addrlen);
>> > > > > + port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);
>> > > > > + break;
>> > > > > +
>> > > > > + case AF_INET6:
>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>> > > > > + break;
>> > > > > + }
>> > > > > +
>> > > > > + ereport(LOG,
>> > > > > + (errmsg("new watchdog node connection is received from \"%s:%d\"",conn->addr, port)));
>> > > > > +
>> > > > > 
>> > > > > Here the switch statement does not have "default" and if ss.ss_family
>> > > > > is not either AF_INET or AF_INET6, the value of "port" will be
>> > > > > undefined. That's why the compiler complains. I think you should have
>> > > > > "default" something like:
>> > > > > 
>> > > > > + case AF_INET6:
>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>> > > > > + break;
>> > > > > +
>> > > > > default:
>> > > > > ereport(ERROR,
>> > > > > (errmsg("unknown ss_family")));
>> > > > > break;
>> > > > > 
>> > > > > BTW the indentations after CASE should be added. See other CASE
>> > > > > statements in pgpool.
>> > > > > 
>> > > > > + else if (sel_ret == -1)
>> > > > > + {
>> > > > > + ereport(ERROR,
>> > > > > + (errmsg("failed to get socket data from socket list")
>> > > > > + ,errdetail("select() failed with errno %d", errno)));
>> > > > > 
>> > > > > You can use %m" here without using errno. See codes using %m in pgpool.
>> > > > > 
>> > > > > + }
>> > > > > + else
>> > > > > + {
>> > > > > + /* unreachable. */
>> > > > > + Assert(FALSE);
>> > > > > + }
>> > > > > 
>> > > > > Assert cannot be used in pgpool for now.
>> > > > > 
>> > > > > Regarding the regression test, I got a timeout error with
>> > > > > 028.watchdog_enable_consensus_with_half_votes test. Have you
>> > > > > succeeded?
>> > > > 
>> > > > In addition, 036.trusted_servers also timed out here.
>> > > 
>> > > I am looking into this a little bit. The time out was caused by below
>> > > in the test script:
>> > > 
>> > > function wait_for_watchdog_startup
>> > > {
>> > > while :
>> > > do
>> > > grep "lifecheck started" $log >/dev/null
>> > > 
>> > > if [ $? = 0 ];then
>> > > break;
>> > > fi
>> > > sleep 1
>> > > done
>> > > }
>> > > 
>> > > It looks for "lifecheck started" in the pgpool log. The message is
>> > > supposed to be issued by lifecheck_main().
>> > > 
>> > > /* wait until ready to go */
>> > > while (WD_OK != is_wd_lifecheck_ready())
>> > > {
>> > > sleep(pool_config->wd_interval * 10);
>> > > 
>> > > }
>> > > 
>> > > ereport(LOG,
>> > > (errmsg("watchdog: lifecheck started")));
>> > > 
>> > > Problem is, is_wd_lifecheck_ready() does not return WD_OK. In is_wd_lifecheck_ready(),
>> > > 
>> > > ereport(DEBUG1,
>> > > (errmsg("watchdog checking life check is ready"),
>> > > errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal yet",
>> > > i, node->hostName, node->pgpoolPort)));
>> > > 
>> > > So I set log_min_messages to debug1 and got following in the log file.
>> > > 
>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:1 at "localhost:50004" has not send the heartbeat signal yet
>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:2 at "localhost:50008" has not send the heartbeat signal yet
>> > > 
>> > > It seems other watchdog's lifecheck process do not send the hearbeat
>> > > signal, or hearbeat receiver could not received the signal?
>> > > 
>> > > Best reagards,
>> > > --
>> > > Tatsuo Ishii
>> > > SRA OSS K.K.
>> > > English: http://www.sraoss.co.jp/index_en/
>> > > Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list