[pgpool-hackers: 4514] Re: Watchdog and IPv6

Kwangwon Seo keiseo at protonmail.com
Tue Aug 27 09:43:36 JST 2024


HI Usama.

ohhh, my pleasure. Thanks for the reviewing.

Yes.. That will be good. Thanks for the tip. That things will be applied in the patch.

Then.. Tatsuo san,

I created the patch files in accordance with Usama's review opinion.

-- patch files

It's 6th patch submission.

It is applied the Usama's suggesion.

Other things are same with 5th patch files.

-----------

Regards

Kwangwon Seo

On Monday, August 26th, 2024 at 5:17 PM, Muhammad Usama <m.usama at gmail.com> wrote:

> Hi Kwangwon,
>
> Thanks for the patch. Overall, it looks good and accomplishes what it's supposed to do.
>
> I just have a couple of minor suggestions regarding the error messages in the `wd_create_hb_recv_socket` and `wd_create_recv_socket` functions, where you’re raising an error if there’s no local address available:
>
> ereport(ERROR, (errmsg("no sockets can be created because no available local address with port:%d", port),
> errdetail("getaddrinfo() result is empty")));
>
> It might be helpful to specify the module that’s attempting to create the socket.
> For example, the above error message in wd_create_recv_socket() could be modified as follows:
>
> ereport(ERROR,
> (errmsg("failed to create watchdog receive socket"),
> errdetail("getaddrinfo() result is empty: no sockets can be created because no available local address with port:%d", port))));
>
> Other than that, the patch looks great.
>
> Thanks
> Best regards
> Muhammad Usama
>
> On Sat, Aug 24, 2024 at 9:43 AM Tatsuo Ishii <ishii at postgresql.org> wrote:
>
>> Hi Kwangwon,
>>
>> Thank you for the quick update. I think your patch is getting closer
>> to commitable state.
>>
>> Usama,
>>
>> As you are the authority of watchdog, I would like to ask you to
>> review the patch.
>>
>> Best reagards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>
>> From: Kwangwon Seo <keiseo at protonmail.com>
>> Subject: Re: [pgpool-hackers: 4502] Re: Watchdog and IPv6
>> Date: Fri, 23 Aug 2024 02:27:16 +0000
>> Message-ID: <NZs60H0XR7JUEi50CZG8W3ijdSb5KrTn3lryKQFC-5UASWAiCMW6v5_hYU99B9Op4SkDYRsuraulgOrGa8fIlPpZNaqREPr-iqxjcozx7YE=@protonmail.com>
>>
>>> ohhh
>>>
>>> What a good news! glad to hear that!!
>>>
>>> details below.
>>>
>>>
>>> -- conventions
>>>
>>> Yes. following the coding convention is an expression of politeness.
>>>
>>> and.. postgresql put a comma at the end and pgpool also does the same.
>>>
>>>
>>> -- about development guidance
>>>
>>> LINK: https://www.pgpool.net/mediawiki/index.php/Development
>>>
>>> according to this development guidance, It looks like.. SGML documentation is also needed for the committable patch.
>>>
>>>
>>> -- patch attachment
>>>
>>> Just in case, I submit the 5th patch which only changes the comma location(v5-0001).
>>>
>>> And the other patch(v5-0002) is about documentation updates.
>>>
>>> but I'm not sure about whether this is accptable or not.
>>>
>>> If those things are not fit the situation, you can ignore this.
>>>
>>>
>>>
>>>
>>> Regards
>>> Kwangwon Seo
>>>
>>> Bitnine Global Inc.
>>> www.bitnine.net
>>>
>>> On Thursday, August 22nd, 2024 at 5:13 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>
>>>> Hi Kwangwon,
>>>>
>>>> Thank you for the updated patches.
>>>>
>>>> > Thanks for all your considerate reviewing and attention. (really appreciate it)
>>>> >
>>>> > I had started fixing it just after receiving the first mail.
>>>> >
>>>> > I also want to reply with actual patch files with some progress so that was the reason of big delay in responding to the mail. sorry about that.
>>>> >
>>>> > anyway, about some progress.
>>>> >
>>>> > -- following advices.
>>>> > indentation, %m, compile warning, and so on..
>>>> > Yes, I checked again so as try to not to make a mistake again...
>>>>
>>>> > -- about regression tests
>>>> > 028.watchdog_enable_consensus_with_half_votes
>>>> > yes. timeout.. your tip is exactly the point. so I could fix that.
>>>>
>>>>
>>>> Great. I have succeeded all regression tests including 028 this time.
>>>>
>>>> > 036.trusted_servers...
>>>> > yes. It's covered now.
>>>>
>>>>
>>>> This suceeded too.
>>>>
>>>> > 004, 011, 012, 013, 014, 015 is covered both localhost and ::1.
>>>> >
>>>> > -- some stories (you can ignore this)
>>>> > Recently, I reinstalled my devices and just installed ssh library without actual executable binary.
>>>> > regress 28 executes pgpool_recovery() and inside the script, starting with the ssh command.
>>>> > That was one of the reasons for spending some time.
>>>> >
>>>> > And the other reason was, I assigned the wrong port number. what a silly me.
>>>> > It was just one line. but I spent too much time trying to find the reason for it. C APIs and Memorycontext mechanism. that's far away from the actual reason.
>>>> >
>>>> > -- about attachment (patch files)
>>>> > whitespaces checked and created with -v option. This is the 4th submission.
>>>> >
>>>> > v4-0001-feature-ipv6-for-watchdog.patch
>>>> > Actual patch file. some detailed messages for additional supplementary inside. It's just for an explanation so you can erase that message.
>>>> >
>>>> > v4-0002-test-localhost-to-1.patch
>>>> > It's just for test purposes. replace address from 'localhost' to '::1' on the regression tests. You can ignore this patch file.
>>>>
>>>>
>>>> I have tested the patch. It worked well here.
>>>>
>>>> > well, anyway, It's not finished until actually finished.
>>>> >
>>>> > So how about this patch? you can say anything at ease..
>>>>
>>>>
>>>> I forgot to say in the last email. It's very subtle point...
>>>>
>>>> + ereport(ERROR,
>>>> + (errmsg("failed to get socket data from heartbeat receive socket list")
>>>> + ,errdetail("select() failed with reason %m")));
>>>> + }
>>>> + else
>>>> + {
>>>> + ereport(ERROR,
>>>> + (errmsg("failed to get socket data from heartbeat receive socket list")
>>>> + ,errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
>>>>
>>>> We usually place "," at the end of the line, not the beginning of the
>>>> line. I mean following style is better.
>>>>
>>>> + (errmsg("failed to get socket data from heartbeat receive socket list"),
>>>> + errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
>>>>
>>>> > ---
>>>> >
>>>> > Regards
>>>> > Kwangwon Seo
>>>> >
>>>> > On Thursday, August 8th, 2024 at 10:35 AM, Tatsuo Ishii ishii at postgresql.org wrote:
>>>> >
>>>> > > > > I think you should give a version number to your patch using "-v"
>>>> > > > > option of git format-patch command.
>>>> > > > >
>>>> > > > > I have quick looked through the patch.
>>>> > > > >
>>>> > > > > t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch
>>>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343: trailing whitespace.
>>>> > > > >
>>>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355: trailing whitespace.
>>>> > > > >
>>>> > > > > warning: 2 lines add whitespace errors.
>>>> > > > >
>>>> > > > > Please remove unnecessary white spaces.
>>>> > > > >
>>>> > > > > Also I got compiler warnings:
>>>> > > > >
>>>> > > > > In file included from watchdog.c:51:
>>>> > > > > watchdog.c: In function 。watchdog_main「:
>>>> > > > > ../../src/include/utils/elog.h:199:4: warning: 。port「 may be used uninitialized in this function [-Wmaybe-uninitialized]
>>>> > > > > 199 | errfinish rest; \
>>>> > > > > | ^~~~~~~~~
>>>> > > > > watchdog.c:3439:8: note: 。port「 was declared here
>>>> > > > > 3439 | int port;
>>>> > > > > | ^~~~
>>>> > > > >
>>>> > > > > This is caused by following code snipet:
>>>> > > > >
>>>> > > > > + switch(ss.ss_family)
>>>> > > > > + {
>>>> > > > > + case AF_INET:
>>>> > > > > + inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr, conn->addr, addrlen);
>>>> > > > > + port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);
>>>> > > > > + break;
>>>> > > > > +
>>>> > > > > + case AF_INET6:
>>>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>>>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>>>> > > > > + break;
>>>> > > > > + }
>>>> > > > > +
>>>> > > > > + ereport(LOG,
>>>> > > > > + (errmsg("new watchdog node connection is received from \"%s:%d\"",conn->addr, port)));
>>>> > > > > +
>>>> > > > >
>>>> > > > > Here the switch statement does not have "default" and if ss.ss_family
>>>> > > > > is not either AF_INET or AF_INET6, the value of "port" will be
>>>> > > > > undefined. That's why the compiler complains. I think you should have
>>>> > > > > "default" something like:
>>>> > > > >
>>>> > > > > + case AF_INET6:
>>>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>>>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>>>> > > > > + break;
>>>> > > > > +
>>>> > > > > default:
>>>> > > > > ereport(ERROR,
>>>> > > > > (errmsg("unknown ss_family")));
>>>> > > > > break;
>>>> > > > >
>>>> > > > > BTW the indentations after CASE should be added. See other CASE
>>>> > > > > statements in pgpool.
>>>> > > > >
>>>> > > > > + else if (sel_ret == -1)
>>>> > > > > + {
>>>> > > > > + ereport(ERROR,
>>>> > > > > + (errmsg("failed to get socket data from socket list")
>>>> > > > > + ,errdetail("select() failed with errno %d", errno)));
>>>> > > > >
>>>> > > > > You can use %m" here without using errno. See codes using %m in pgpool.
>>>> > > > >
>>>> > > > > + }
>>>> > > > > + else
>>>> > > > > + {
>>>> > > > > + /* unreachable. */
>>>> > > > > + Assert(FALSE);
>>>> > > > > + }
>>>> > > > >
>>>> > > > > Assert cannot be used in pgpool for now.
>>>> > > > >
>>>> > > > > Regarding the regression test, I got a timeout error with
>>>> > > > > 028.watchdog_enable_consensus_with_half_votes test. Have you
>>>> > > > > succeeded?
>>>> > > >
>>>> > > > In addition, 036.trusted_servers also timed out here.
>>>> > >
>>>> > > I am looking into this a little bit. The time out was caused by below
>>>> > > in the test script:
>>>> > >
>>>> > > function wait_for_watchdog_startup
>>>> > > {
>>>> > > while :
>>>> > > do
>>>> > > grep "lifecheck started" $log >/dev/null
>>>> > >
>>>> > > if [ $? = 0 ];then
>>>> > > break;
>>>> > > fi
>>>> > > sleep 1
>>>> > > done
>>>> > > }
>>>> > >
>>>> > > It looks for "lifecheck started" in the pgpool log. The message is
>>>> > > supposed to be issued by lifecheck_main().
>>>> > >
>>>> > > /* wait until ready to go */
>>>> > > while (WD_OK != is_wd_lifecheck_ready())
>>>> > > {
>>>> > > sleep(pool_config->wd_interval * 10);
>>>> > >
>>>> > > }
>>>> > >
>>>> > > ereport(LOG,
>>>> > > (errmsg("watchdog: lifecheck started")));
>>>> > >
>>>> > > Problem is, is_wd_lifecheck_ready() does not return WD_OK. In is_wd_lifecheck_ready(),
>>>> > >
>>>> > > ereport(DEBUG1,
>>>> > > (errmsg("watchdog checking life check is ready"),
>>>> > > errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal yet",
>>>> > > i, node->hostName, node->pgpoolPort)));
>>>> > >
>>>> > > So I set log_min_messages to debug1 and got following in the log file.
>>>> > >
>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:1 at "localhost:50004" has not send the heartbeat signal yet
>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:2 at "localhost:50008" has not send the heartbeat signal yet
>>>> > >
>>>> > > It seems other watchdog's lifecheck process do not send the hearbeat
>>>> > > signal, or hearbeat receiver could not received the signal?
>>>> > >
>>>> > > Best reagards,
>>>> > > --
>>>> > > Tatsuo Ishii
>>>> > > SRA OSS K.K.
>>>> > > English: http://www.sraoss.co.jp/index_en/
>>>> > > Japanese:http://www.sraoss.co.jp
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240827/526f8a08/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v6-0002-Doc-mention-about-watchdog-ipv6-supporting.patch
Type: text/x-patch
Size: 3719 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240827/526f8a08/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v6-0001-feature-ipv6-for-watchdog.patch
Type: text/x-patch
Size: 30878 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240827/526f8a08/attachment-0003.bin>


More information about the pgpool-hackers mailing list