<div dir="ltr"><div>Hi Kwangwon,</div><div><br></div><div>Thanks for the patch. Overall, it looks good and accomplishes what it's supposed to do.<br><br>I just have a couple of minor suggestions regarding the error messages in the `wd_create_hb_recv_socket` and `wd_create_recv_socket` functions, where you’re raising an error if there’s no local address available:<br><br>ereport(ERROR, (errmsg("no sockets can be created because no available local address with port:%d", port),<br> errdetail("getaddrinfo() result is empty")));<br><br>It might be helpful to specify the module that’s attempting to create the socket.</div><div>For example, the above error message in wd_create_recv_socket() could be modified as follows:</div><div><br>ereport(ERROR,<br> (errmsg("failed to create watchdog receive socket"),<br> errdetail("getaddrinfo() result is empty: no sockets can be created because no available local address with port:%d", port))));<br><br>Other than that, the patch looks great.<br></div><div><br></div><div>Thanks</div><div>Best regards</div><div>Muhammad Usama</div><div><br></div><div><br></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Aug 24, 2024 at 9:43 AM Tatsuo Ishii <<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Kwangwon,<br>
<br>
Thank you for the quick update. I think your patch is getting closer<br>
to commitable state.<br>
<br>
Usama,<br>
<br>
As you are the authority of watchdog, I would like to ask you to<br>
review the patch.<br>
<br>
Best reagards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS K.K.<br>
English: <a href="http://www.sraoss.co.jp/index_en/" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en/</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
<br>
From: Kwangwon Seo <<a href="mailto:keiseo@protonmail.com" target="_blank">keiseo@protonmail.com</a>><br>
Subject: Re: [pgpool-hackers: 4502] Re: Watchdog and IPv6<br>
Date: Fri, 23 Aug 2024 02:27:16 +0000<br>
Message-ID: <NZs60H0XR7JUEi50CZG8W3ijdSb5KrTn3lryKQFC-5UASWAiCMW6v5_hYU99B9Op4SkDYRsuraulgOrGa8fIlPpZNaqREPr-iqxjcozx7YE=@<a href="http://protonmail.com" rel="noreferrer" target="_blank">protonmail.com</a>><br>
<br>
> ohhh<br>
> <br>
> What a good news! glad to hear that!!<br>
> <br>
> details below.<br>
> <br>
> <br>
> -- conventions<br>
> <br>
> Yes. following the coding convention is an expression of politeness.<br>
> <br>
> and.. postgresql put a comma at the end and pgpool also does the same.<br>
> <br>
> <br>
> -- about development guidance<br>
> <br>
> LINK: <a href="https://www.pgpool.net/mediawiki/index.php/Development" rel="noreferrer" target="_blank">https://www.pgpool.net/mediawiki/index.php/Development</a> <br>
> <br>
> according to this development guidance, It looks like.. SGML documentation is also needed for the committable patch.<br>
> <br>
> <br>
> -- patch attachment<br>
> <br>
> Just in case, I submit the 5th patch which only changes the comma location(v5-0001).<br>
> <br>
> And the other patch(v5-0002) is about documentation updates.<br>
> <br>
> but I'm not sure about whether this is accptable or not.<br>
> <br>
> If those things are not fit the situation, you can ignore this.<br>
> <br>
> <br>
> <br>
> <br>
> Regards<br>
> Kwangwon Seo<br>
> <br>
> Bitnine Global Inc.<br>
> <a href="http://www.bitnine.net" rel="noreferrer" target="_blank">www.bitnine.net</a><br>
> <br>
> On Thursday, August 22nd, 2024 at 5:13 PM, Tatsuo Ishii <<a href="mailto:ishii@postgresql.org" target="_blank">ishii@postgresql.org</a>> wrote:<br>
> <br>
>> Hi Kwangwon,<br>
>> <br>
>> Thank you for the updated patches.<br>
>> <br>
>> > Thanks for all your considerate reviewing and attention. (really appreciate it)<br>
>> > <br>
>> > I had started fixing it just after receiving the first mail.<br>
>> > <br>
>> > I also want to reply with actual patch files with some progress so that was the reason of big delay in responding to the mail. sorry about that.<br>
>> > <br>
>> > anyway, about some progress.<br>
>> > <br>
>> > -- following advices.<br>
>> > indentation, %m, compile warning, and so on..<br>
>> > Yes, I checked again so as try to not to make a mistake again...<br>
>> <br>
>> > -- about regression tests<br>
>> > 028.watchdog_enable_consensus_with_half_votes<br>
>> > yes. timeout.. your tip is exactly the point. so I could fix that.<br>
>> <br>
>> <br>
>> Great. I have succeeded all regression tests including 028 this time.<br>
>> <br>
>> > 036.trusted_servers...<br>
>> > yes. It's covered now.<br>
>> <br>
>> <br>
>> This suceeded too.<br>
>> <br>
>> > 004, 011, 012, 013, 014, 015 is covered both localhost and ::1.<br>
>> > <br>
>> > -- some stories (you can ignore this)<br>
>> > Recently, I reinstalled my devices and just installed ssh library without actual executable binary.<br>
>> > regress 28 executes pgpool_recovery() and inside the script, starting with the ssh command.<br>
>> > That was one of the reasons for spending some time.<br>
>> > <br>
>> > And the other reason was, I assigned the wrong port number. what a silly me.<br>
>> > It was just one line. but I spent too much time trying to find the reason for it. C APIs and Memorycontext mechanism. that's far away from the actual reason.<br>
>> > <br>
>> > -- about attachment (patch files)<br>
>> > whitespaces checked and created with -v option. This is the 4th submission.<br>
>> > <br>
>> > v4-0001-feature-ipv6-for-watchdog.patch<br>
>> > Actual patch file. some detailed messages for additional supplementary inside. It's just for an explanation so you can erase that message.<br>
>> > <br>
>> > v4-0002-test-localhost-to-1.patch<br>
>> > It's just for test purposes. replace address from 'localhost' to '::1' on the regression tests. You can ignore this patch file.<br>
>> <br>
>> <br>
>> I have tested the patch. It worked well here.<br>
>> <br>
>> > well, anyway, It's not finished until actually finished.<br>
>> > <br>
>> > So how about this patch? you can say anything at ease..<br>
>> <br>
>> <br>
>> I forgot to say in the last email. It's very subtle point...<br>
>> <br>
>> + ereport(ERROR,<br>
>> + (errmsg("failed to get socket data from heartbeat receive socket list")<br>
>> + ,errdetail("select() failed with reason %m")));<br>
>> + }<br>
>> + else<br>
>> + {<br>
>> + ereport(ERROR,<br>
>> + (errmsg("failed to get socket data from heartbeat receive socket list")<br>
>> + ,errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));<br>
>> <br>
>> We usually place "," at the end of the line, not the beginning of the<br>
>> line. I mean following style is better.<br>
>> <br>
>> + (errmsg("failed to get socket data from heartbeat receive socket list"),<br>
>> + errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));<br>
>> <br>
>> > ---<br>
>> > <br>
>> > Regards<br>
>> > Kwangwon Seo<br>
>> > <br>
>> > On Thursday, August 8th, 2024 at 10:35 AM, Tatsuo Ishii <a href="mailto:ishii@postgresql.org" target="_blank">ishii@postgresql.org</a> wrote:<br>
>> > <br>
>> > > > > I think you should give a version number to your patch using "-v"<br>
>> > > > > option of git format-patch command.<br>
>> > > > > <br>
>> > > > > I have quick looked through the patch.<br>
>> > > > > <br>
>> > > > > t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch<br>
>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343: trailing whitespace.<br>
>> > > > > <br>
>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355: trailing whitespace.<br>
>> > > > > <br>
>> > > > > warning: 2 lines add whitespace errors.<br>
>> > > > > <br>
>> > > > > Please remove unnecessary white spaces.<br>
>> > > > > <br>
>> > > > > Also I got compiler warnings:<br>
>> > > > > <br>
>> > > > > In file included from watchdog.c:51:<br>
>> > > > > watchdog.c: In function 。watchdog_main「:<br>
>> > > > > ../../src/include/utils/elog.h:199:4: warning: 。port「 may be used uninitialized in this function [-Wmaybe-uninitialized]<br>
>> > > > > 199 | errfinish rest; \<br>
>> > > > > | ^~~~~~~~~<br>
>> > > > > watchdog.c:3439:8: note: 。port「 was declared here<br>
>> > > > > 3439 | int port;<br>
>> > > > > | ^~~~<br>
>> > > > > <br>
>> > > > > This is caused by following code snipet:<br>
>> > > > > <br>
>> > > > > + switch(ss.ss_family)<br>
>> > > > > + {<br>
>> > > > > + case AF_INET:<br>
>> > > > > + inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr, conn->addr, addrlen);<br>
>> > > > > + port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);<br>
>> > > > > + break;<br>
>> > > > > +<br>
>> > > > > + case AF_INET6:<br>
>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);<br>
>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);<br>
>> > > > > + break;<br>
>> > > > > + }<br>
>> > > > > +<br>
>> > > > > + ereport(LOG,<br>
>> > > > > + (errmsg("new watchdog node connection is received from \"%s:%d\"",conn->addr, port)));<br>
>> > > > > +<br>
>> > > > > <br>
>> > > > > Here the switch statement does not have "default" and if ss.ss_family<br>
>> > > > > is not either AF_INET or AF_INET6, the value of "port" will be<br>
>> > > > > undefined. That's why the compiler complains. I think you should have<br>
>> > > > > "default" something like:<br>
>> > > > > <br>
>> > > > > + case AF_INET6:<br>
>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);<br>
>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);<br>
>> > > > > + break;<br>
>> > > > > +<br>
>> > > > > default:<br>
>> > > > > ereport(ERROR,<br>
>> > > > > (errmsg("unknown ss_family")));<br>
>> > > > > break;<br>
>> > > > > <br>
>> > > > > BTW the indentations after CASE should be added. See other CASE<br>
>> > > > > statements in pgpool.<br>
>> > > > > <br>
>> > > > > + else if (sel_ret == -1)<br>
>> > > > > + {<br>
>> > > > > + ereport(ERROR,<br>
>> > > > > + (errmsg("failed to get socket data from socket list")<br>
>> > > > > + ,errdetail("select() failed with errno %d", errno)));<br>
>> > > > > <br>
>> > > > > You can use %m" here without using errno. See codes using %m in pgpool.<br>
>> > > > > <br>
>> > > > > + }<br>
>> > > > > + else<br>
>> > > > > + {<br>
>> > > > > + /* unreachable. */<br>
>> > > > > + Assert(FALSE);<br>
>> > > > > + }<br>
>> > > > > <br>
>> > > > > Assert cannot be used in pgpool for now.<br>
>> > > > > <br>
>> > > > > Regarding the regression test, I got a timeout error with<br>
>> > > > > 028.watchdog_enable_consensus_with_half_votes test. Have you<br>
>> > > > > succeeded?<br>
>> > > > <br>
>> > > > In addition, 036.trusted_servers also timed out here.<br>
>> > > <br>
>> > > I am looking into this a little bit. The time out was caused by below<br>
>> > > in the test script:<br>
>> > > <br>
>> > > function wait_for_watchdog_startup<br>
>> > > {<br>
>> > > while :<br>
>> > > do<br>
>> > > grep "lifecheck started" $log >/dev/null<br>
>> > > <br>
>> > > if [ $? = 0 ];then<br>
>> > > break;<br>
>> > > fi<br>
>> > > sleep 1<br>
>> > > done<br>
>> > > }<br>
>> > > <br>
>> > > It looks for "lifecheck started" in the pgpool log. The message is<br>
>> > > supposed to be issued by lifecheck_main().<br>
>> > > <br>
>> > > /* wait until ready to go */<br>
>> > > while (WD_OK != is_wd_lifecheck_ready())<br>
>> > > {<br>
>> > > sleep(pool_config->wd_interval * 10);<br>
>> > > <br>
>> > > }<br>
>> > > <br>
>> > > ereport(LOG,<br>
>> > > (errmsg("watchdog: lifecheck started")));<br>
>> > > <br>
>> > > Problem is, is_wd_lifecheck_ready() does not return WD_OK. In is_wd_lifecheck_ready(),<br>
>> > > <br>
>> > > ereport(DEBUG1,<br>
>> > > (errmsg("watchdog checking life check is ready"),<br>
>> > > errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal yet",<br>
>> > > i, node->hostName, node->pgpoolPort)));<br>
>> > > <br>
>> > > So I set log_min_messages to debug1 and got following in the log file.<br>
>> > > <br>
>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready<br>
>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:1 at "localhost:50004" has not send the heartbeat signal yet<br>
>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready<br>
>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:2 at "localhost:50008" has not send the heartbeat signal yet<br>
>> > > <br>
>> > > It seems other watchdog's lifecheck process do not send the hearbeat<br>
>> > > signal, or hearbeat receiver could not received the signal?<br>
>> > > <br>
>> > > Best reagards,<br>
>> > > --<br>
>> > > Tatsuo Ishii<br>
>> > > SRA OSS K.K.<br>
>> > > English: <a href="http://www.sraoss.co.jp/index_en/" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en/</a><br>
>> > > Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
_______________________________________________<br>
pgpool-hackers mailing list<br>
<a href="mailto:pgpool-hackers@pgpool.net" target="_blank">pgpool-hackers@pgpool.net</a><br>
<a href="http://www.pgpool.net/mailman/listinfo/pgpool-hackers" rel="noreferrer" target="_blank">http://www.pgpool.net/mailman/listinfo/pgpool-hackers</a><br>
</blockquote></div></div>