[pgpool-hackers: 4516] Re: Watchdog and IPv6

Tatsuo Ishii ishii at postgresql.org
Tue Aug 27 11:24:46 JST 2024


I have updated the TODO list.

https://pgpool.net/mediawiki/index.php/TODO

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Hi Kwangwon,
> 
> I have pushed the v6 patch to master branch. The feature will appear
> in the next major release of Pgpool-II (supposed to be v4.6).
> 
> Thank you for the great contribution!
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
> 
>> HI Usama.
>> 
>> ohhh, my pleasure. Thanks for the reviewing.
>> 
>> Yes.. That will be good. Thanks for the tip. That things will be applied in the patch.
>> 
>> Then.. Tatsuo san,
>> 
>> I created the patch files in accordance with Usama's review opinion.
>> 
>> -- patch files
>> 
>> It's 6th patch submission.
>> 
>> It is applied the Usama's suggesion.
>> 
>> Other things are same with 5th patch files.
>> 
>> -----------
>> 
>> Regards
>> 
>> Kwangwon Seo
>> 
>> On Monday, August 26th, 2024 at 5:17 PM, Muhammad Usama <m.usama at gmail.com> wrote:
>> 
>>> Hi Kwangwon,
>>>
>>> Thanks for the patch. Overall, it looks good and accomplishes what it's supposed to do.
>>>
>>> I just have a couple of minor suggestions regarding the error messages in the `wd_create_hb_recv_socket` and `wd_create_recv_socket` functions, where you’re raising an error if there’s no local address available:
>>>
>>> ereport(ERROR, (errmsg("no sockets can be created because no available local address with port:%d", port),
>>> errdetail("getaddrinfo() result is empty")));
>>>
>>> It might be helpful to specify the module that’s attempting to create the socket.
>>> For example, the above error message in wd_create_recv_socket() could be modified as follows:
>>>
>>> ereport(ERROR,
>>> (errmsg("failed to create watchdog receive socket"),
>>> errdetail("getaddrinfo() result is empty: no sockets can be created because no available local address with port:%d", port))));
>>>
>>> Other than that, the patch looks great.
>>>
>>> Thanks
>>> Best regards
>>> Muhammad Usama
>>>
>>> On Sat, Aug 24, 2024 at 9:43 AM Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>
>>>> Hi Kwangwon,
>>>>
>>>> Thank you for the quick update. I think your patch is getting closer
>>>> to commitable state.
>>>>
>>>> Usama,
>>>>
>>>> As you are the authority of watchdog, I would like to ask you to
>>>> review the patch.
>>>>
>>>> Best reagards,
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS K.K.
>>>> English: http://www.sraoss.co.jp/index_en/
>>>> Japanese:http://www.sraoss.co.jp
>>>>
>>>> From: Kwangwon Seo <keiseo at protonmail.com>
>>>> Subject: Re: [pgpool-hackers: 4502] Re: Watchdog and IPv6
>>>> Date: Fri, 23 Aug 2024 02:27:16 +0000
>>>> Message-ID: <NZs60H0XR7JUEi50CZG8W3ijdSb5KrTn3lryKQFC-5UASWAiCMW6v5_hYU99B9Op4SkDYRsuraulgOrGa8fIlPpZNaqREPr-iqxjcozx7YE=@protonmail.com>
>>>>
>>>>> ohhh
>>>>>
>>>>> What a good news! glad to hear that!!
>>>>>
>>>>> details below.
>>>>>
>>>>>
>>>>> -- conventions
>>>>>
>>>>> Yes. following the coding convention is an expression of politeness.
>>>>>
>>>>> and.. postgresql put a comma at the end and pgpool also does the same.
>>>>>
>>>>>
>>>>> -- about development guidance
>>>>>
>>>>> LINK: https://www.pgpool.net/mediawiki/index.php/Development
>>>>>
>>>>> according to this development guidance, It looks like.. SGML documentation is also needed for the committable patch.
>>>>>
>>>>>
>>>>> -- patch attachment
>>>>>
>>>>> Just in case, I submit the 5th patch which only changes the comma location(v5-0001).
>>>>>
>>>>> And the other patch(v5-0002) is about documentation updates.
>>>>>
>>>>> but I'm not sure about whether this is accptable or not.
>>>>>
>>>>> If those things are not fit the situation, you can ignore this.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>> Kwangwon Seo
>>>>>
>>>>> Bitnine Global Inc.
>>>>> www.bitnine.net
>>>>>
>>>>> On Thursday, August 22nd, 2024 at 5:13 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>>
>>>>>> Hi Kwangwon,
>>>>>>
>>>>>> Thank you for the updated patches.
>>>>>>
>>>>>> > Thanks for all your considerate reviewing and attention. (really appreciate it)
>>>>>> >
>>>>>> > I had started fixing it just after receiving the first mail.
>>>>>> >
>>>>>> > I also want to reply with actual patch files with some progress so that was the reason of big delay in responding to the mail. sorry about that.
>>>>>> >
>>>>>> > anyway, about some progress.
>>>>>> >
>>>>>> > -- following advices.
>>>>>> > indentation, %m, compile warning, and so on..
>>>>>> > Yes, I checked again so as try to not to make a mistake again...
>>>>>>
>>>>>> > -- about regression tests
>>>>>> > 028.watchdog_enable_consensus_with_half_votes
>>>>>> > yes. timeout.. your tip is exactly the point. so I could fix that.
>>>>>>
>>>>>>
>>>>>> Great. I have succeeded all regression tests including 028 this time.
>>>>>>
>>>>>> > 036.trusted_servers...
>>>>>> > yes. It's covered now.
>>>>>>
>>>>>>
>>>>>> This suceeded too.
>>>>>>
>>>>>> > 004, 011, 012, 013, 014, 015 is covered both localhost and ::1.
>>>>>> >
>>>>>> > -- some stories (you can ignore this)
>>>>>> > Recently, I reinstalled my devices and just installed ssh library without actual executable binary.
>>>>>> > regress 28 executes pgpool_recovery() and inside the script, starting with the ssh command.
>>>>>> > That was one of the reasons for spending some time.
>>>>>> >
>>>>>> > And the other reason was, I assigned the wrong port number. what a silly me.
>>>>>> > It was just one line. but I spent too much time trying to find the reason for it. C APIs and Memorycontext mechanism. that's far away from the actual reason.
>>>>>> >
>>>>>> > -- about attachment (patch files)
>>>>>> > whitespaces checked and created with -v option. This is the 4th submission.
>>>>>> >
>>>>>> > v4-0001-feature-ipv6-for-watchdog.patch
>>>>>> > Actual patch file. some detailed messages for additional supplementary inside. It's just for an explanation so you can erase that message.
>>>>>> >
>>>>>> > v4-0002-test-localhost-to-1.patch
>>>>>> > It's just for test purposes. replace address from 'localhost' to '::1' on the regression tests. You can ignore this patch file.
>>>>>>
>>>>>>
>>>>>> I have tested the patch. It worked well here.
>>>>>>
>>>>>> > well, anyway, It's not finished until actually finished.
>>>>>> >
>>>>>> > So how about this patch? you can say anything at ease..
>>>>>>
>>>>>>
>>>>>> I forgot to say in the last email. It's very subtle point...
>>>>>>
>>>>>> + ereport(ERROR,
>>>>>> + (errmsg("failed to get socket data from heartbeat receive socket list")
>>>>>> + ,errdetail("select() failed with reason %m")));
>>>>>> + }
>>>>>> + else
>>>>>> + {
>>>>>> + ereport(ERROR,
>>>>>> + (errmsg("failed to get socket data from heartbeat receive socket list")
>>>>>> + ,errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
>>>>>>
>>>>>> We usually place "," at the end of the line, not the beginning of the
>>>>>> line. I mean following style is better.
>>>>>>
>>>>>> + (errmsg("failed to get socket data from heartbeat receive socket list"),
>>>>>> + errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
>>>>>>
>>>>>> > ---
>>>>>> >
>>>>>> > Regards
>>>>>> > Kwangwon Seo
>>>>>> >
>>>>>> > On Thursday, August 8th, 2024 at 10:35 AM, Tatsuo Ishii ishii at postgresql.org wrote:
>>>>>> >
>>>>>> > > > > I think you should give a version number to your patch using "-v"
>>>>>> > > > > option of git format-patch command.
>>>>>> > > > >
>>>>>> > > > > I have quick looked through the patch.
>>>>>> > > > >
>>>>>> > > > > t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch
>>>>>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343: trailing whitespace.
>>>>>> > > > >
>>>>>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355: trailing whitespace.
>>>>>> > > > >
>>>>>> > > > > warning: 2 lines add whitespace errors.
>>>>>> > > > >
>>>>>> > > > > Please remove unnecessary white spaces.
>>>>>> > > > >
>>>>>> > > > > Also I got compiler warnings:
>>>>>> > > > >
>>>>>> > > > > In file included from watchdog.c:51:
>>>>>> > > > > watchdog.c: In function 。watchdog_main「:
>>>>>> > > > > ../../src/include/utils/elog.h:199:4: warning: 。port「 may be used uninitialized in this function [-Wmaybe-uninitialized]
>>>>>> > > > > 199 | errfinish rest; \
>>>>>> > > > > | ^~~~~~~~~
>>>>>> > > > > watchdog.c:3439:8: note: 。port「 was declared here
>>>>>> > > > > 3439 | int port;
>>>>>> > > > > | ^~~~
>>>>>> > > > >
>>>>>> > > > > This is caused by following code snipet:
>>>>>> > > > >
>>>>>> > > > > + switch(ss.ss_family)
>>>>>> > > > > + {
>>>>>> > > > > + case AF_INET:
>>>>>> > > > > + inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr, conn->addr, addrlen);
>>>>>> > > > > + port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);
>>>>>> > > > > + break;
>>>>>> > > > > +
>>>>>> > > > > + case AF_INET6:
>>>>>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>>>>>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>>>>>> > > > > + break;
>>>>>> > > > > + }
>>>>>> > > > > +
>>>>>> > > > > + ereport(LOG,
>>>>>> > > > > + (errmsg("new watchdog node connection is received from \"%s:%d\"",conn->addr, port)));
>>>>>> > > > > +
>>>>>> > > > >
>>>>>> > > > > Here the switch statement does not have "default" and if ss.ss_family
>>>>>> > > > > is not either AF_INET or AF_INET6, the value of "port" will be
>>>>>> > > > > undefined. That's why the compiler complains. I think you should have
>>>>>> > > > > "default" something like:
>>>>>> > > > >
>>>>>> > > > > + case AF_INET6:
>>>>>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>>>>>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>>>>>> > > > > + break;
>>>>>> > > > > +
>>>>>> > > > > default:
>>>>>> > > > > ereport(ERROR,
>>>>>> > > > > (errmsg("unknown ss_family")));
>>>>>> > > > > break;
>>>>>> > > > >
>>>>>> > > > > BTW the indentations after CASE should be added. See other CASE
>>>>>> > > > > statements in pgpool.
>>>>>> > > > >
>>>>>> > > > > + else if (sel_ret == -1)
>>>>>> > > > > + {
>>>>>> > > > > + ereport(ERROR,
>>>>>> > > > > + (errmsg("failed to get socket data from socket list")
>>>>>> > > > > + ,errdetail("select() failed with errno %d", errno)));
>>>>>> > > > >
>>>>>> > > > > You can use %m" here without using errno. See codes using %m in pgpool.
>>>>>> > > > >
>>>>>> > > > > + }
>>>>>> > > > > + else
>>>>>> > > > > + {
>>>>>> > > > > + /* unreachable. */
>>>>>> > > > > + Assert(FALSE);
>>>>>> > > > > + }
>>>>>> > > > >
>>>>>> > > > > Assert cannot be used in pgpool for now.
>>>>>> > > > >
>>>>>> > > > > Regarding the regression test, I got a timeout error with
>>>>>> > > > > 028.watchdog_enable_consensus_with_half_votes test. Have you
>>>>>> > > > > succeeded?
>>>>>> > > >
>>>>>> > > > In addition, 036.trusted_servers also timed out here.
>>>>>> > >
>>>>>> > > I am looking into this a little bit. The time out was caused by below
>>>>>> > > in the test script:
>>>>>> > >
>>>>>> > > function wait_for_watchdog_startup
>>>>>> > > {
>>>>>> > > while :
>>>>>> > > do
>>>>>> > > grep "lifecheck started" $log >/dev/null
>>>>>> > >
>>>>>> > > if [ $? = 0 ];then
>>>>>> > > break;
>>>>>> > > fi
>>>>>> > > sleep 1
>>>>>> > > done
>>>>>> > > }
>>>>>> > >
>>>>>> > > It looks for "lifecheck started" in the pgpool log. The message is
>>>>>> > > supposed to be issued by lifecheck_main().
>>>>>> > >
>>>>>> > > /* wait until ready to go */
>>>>>> > > while (WD_OK != is_wd_lifecheck_ready())
>>>>>> > > {
>>>>>> > > sleep(pool_config->wd_interval * 10);
>>>>>> > >
>>>>>> > > }
>>>>>> > >
>>>>>> > > ereport(LOG,
>>>>>> > > (errmsg("watchdog: lifecheck started")));
>>>>>> > >
>>>>>> > > Problem is, is_wd_lifecheck_ready() does not return WD_OK. In is_wd_lifecheck_ready(),
>>>>>> > >
>>>>>> > > ereport(DEBUG1,
>>>>>> > > (errmsg("watchdog checking life check is ready"),
>>>>>> > > errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal yet",
>>>>>> > > i, node->hostName, node->pgpoolPort)));
>>>>>> > >
>>>>>> > > So I set log_min_messages to debug1 and got following in the log file.
>>>>>> > >
>>>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
>>>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:1 at "localhost:50004" has not send the heartbeat signal yet
>>>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
>>>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:2 at "localhost:50008" has not send the heartbeat signal yet
>>>>>> > >
>>>>>> > > It seems other watchdog's lifecheck process do not send the hearbeat
>>>>>> > > signal, or hearbeat receiver could not received the signal?
>>>>>> > >
>>>>>> > > Best reagards,
>>>>>> > > --
>>>>>> > > Tatsuo Ishii
>>>>>> > > SRA OSS K.K.
>>>>>> > > English: http://www.sraoss.co.jp/index_en/
>>>>>> > > Japanese:http://www.sraoss.co.jp
>>>> _______________________________________________
>>>> pgpool-hackers mailing list
>>>> pgpool-hackers at pgpool.net
>>>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers


More information about the pgpool-hackers mailing list