[pgpool-hackers: 4515] Re: Watchdog and IPv6

Tatsuo Ishii ishii at postgresql.org
Tue Aug 27 11:04:17 JST 2024


Hi Kwangwon,

I have pushed the v6 patch to master branch. The feature will appear
in the next major release of Pgpool-II (supposed to be v4.6).

Thank you for the great contribution!

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> HI Usama.
> 
> ohhh, my pleasure. Thanks for the reviewing.
> 
> Yes.. That will be good. Thanks for the tip. That things will be applied in the patch.
> 
> Then.. Tatsuo san,
> 
> I created the patch files in accordance with Usama's review opinion.
> 
> -- patch files
> 
> It's 6th patch submission.
> 
> It is applied the Usama's suggesion.
> 
> Other things are same with 5th patch files.
> 
> -----------
> 
> Regards
> 
> Kwangwon Seo
> 
> On Monday, August 26th, 2024 at 5:17 PM, Muhammad Usama <m.usama at gmail.com> wrote:
> 
>> Hi Kwangwon,
>>
>> Thanks for the patch. Overall, it looks good and accomplishes what it's supposed to do.
>>
>> I just have a couple of minor suggestions regarding the error messages in the `wd_create_hb_recv_socket` and `wd_create_recv_socket` functions, where you’re raising an error if there’s no local address available:
>>
>> ereport(ERROR, (errmsg("no sockets can be created because no available local address with port:%d", port),
>> errdetail("getaddrinfo() result is empty")));
>>
>> It might be helpful to specify the module that’s attempting to create the socket.
>> For example, the above error message in wd_create_recv_socket() could be modified as follows:
>>
>> ereport(ERROR,
>> (errmsg("failed to create watchdog receive socket"),
>> errdetail("getaddrinfo() result is empty: no sockets can be created because no available local address with port:%d", port))));
>>
>> Other than that, the patch looks great.
>>
>> Thanks
>> Best regards
>> Muhammad Usama
>>
>> On Sat, Aug 24, 2024 at 9:43 AM Tatsuo Ishii <ishii at postgresql.org> wrote:
>>
>>> Hi Kwangwon,
>>>
>>> Thank you for the quick update. I think your patch is getting closer
>>> to commitable state.
>>>
>>> Usama,
>>>
>>> As you are the authority of watchdog, I would like to ask you to
>>> review the patch.
>>>
>>> Best reagards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS K.K.
>>> English: http://www.sraoss.co.jp/index_en/
>>> Japanese:http://www.sraoss.co.jp
>>>
>>> From: Kwangwon Seo <keiseo at protonmail.com>
>>> Subject: Re: [pgpool-hackers: 4502] Re: Watchdog and IPv6
>>> Date: Fri, 23 Aug 2024 02:27:16 +0000
>>> Message-ID: <NZs60H0XR7JUEi50CZG8W3ijdSb5KrTn3lryKQFC-5UASWAiCMW6v5_hYU99B9Op4SkDYRsuraulgOrGa8fIlPpZNaqREPr-iqxjcozx7YE=@protonmail.com>
>>>
>>>> ohhh
>>>>
>>>> What a good news! glad to hear that!!
>>>>
>>>> details below.
>>>>
>>>>
>>>> -- conventions
>>>>
>>>> Yes. following the coding convention is an expression of politeness.
>>>>
>>>> and.. postgresql put a comma at the end and pgpool also does the same.
>>>>
>>>>
>>>> -- about development guidance
>>>>
>>>> LINK: https://www.pgpool.net/mediawiki/index.php/Development
>>>>
>>>> according to this development guidance, It looks like.. SGML documentation is also needed for the committable patch.
>>>>
>>>>
>>>> -- patch attachment
>>>>
>>>> Just in case, I submit the 5th patch which only changes the comma location(v5-0001).
>>>>
>>>> And the other patch(v5-0002) is about documentation updates.
>>>>
>>>> but I'm not sure about whether this is accptable or not.
>>>>
>>>> If those things are not fit the situation, you can ignore this.
>>>>
>>>>
>>>>
>>>>
>>>> Regards
>>>> Kwangwon Seo
>>>>
>>>> Bitnine Global Inc.
>>>> www.bitnine.net
>>>>
>>>> On Thursday, August 22nd, 2024 at 5:13 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>
>>>>> Hi Kwangwon,
>>>>>
>>>>> Thank you for the updated patches.
>>>>>
>>>>> > Thanks for all your considerate reviewing and attention. (really appreciate it)
>>>>> >
>>>>> > I had started fixing it just after receiving the first mail.
>>>>> >
>>>>> > I also want to reply with actual patch files with some progress so that was the reason of big delay in responding to the mail. sorry about that.
>>>>> >
>>>>> > anyway, about some progress.
>>>>> >
>>>>> > -- following advices.
>>>>> > indentation, %m, compile warning, and so on..
>>>>> > Yes, I checked again so as try to not to make a mistake again...
>>>>>
>>>>> > -- about regression tests
>>>>> > 028.watchdog_enable_consensus_with_half_votes
>>>>> > yes. timeout.. your tip is exactly the point. so I could fix that.
>>>>>
>>>>>
>>>>> Great. I have succeeded all regression tests including 028 this time.
>>>>>
>>>>> > 036.trusted_servers...
>>>>> > yes. It's covered now.
>>>>>
>>>>>
>>>>> This suceeded too.
>>>>>
>>>>> > 004, 011, 012, 013, 014, 015 is covered both localhost and ::1.
>>>>> >
>>>>> > -- some stories (you can ignore this)
>>>>> > Recently, I reinstalled my devices and just installed ssh library without actual executable binary.
>>>>> > regress 28 executes pgpool_recovery() and inside the script, starting with the ssh command.
>>>>> > That was one of the reasons for spending some time.
>>>>> >
>>>>> > And the other reason was, I assigned the wrong port number. what a silly me.
>>>>> > It was just one line. but I spent too much time trying to find the reason for it. C APIs and Memorycontext mechanism. that's far away from the actual reason.
>>>>> >
>>>>> > -- about attachment (patch files)
>>>>> > whitespaces checked and created with -v option. This is the 4th submission.
>>>>> >
>>>>> > v4-0001-feature-ipv6-for-watchdog.patch
>>>>> > Actual patch file. some detailed messages for additional supplementary inside. It's just for an explanation so you can erase that message.
>>>>> >
>>>>> > v4-0002-test-localhost-to-1.patch
>>>>> > It's just for test purposes. replace address from 'localhost' to '::1' on the regression tests. You can ignore this patch file.
>>>>>
>>>>>
>>>>> I have tested the patch. It worked well here.
>>>>>
>>>>> > well, anyway, It's not finished until actually finished.
>>>>> >
>>>>> > So how about this patch? you can say anything at ease..
>>>>>
>>>>>
>>>>> I forgot to say in the last email. It's very subtle point...
>>>>>
>>>>> + ereport(ERROR,
>>>>> + (errmsg("failed to get socket data from heartbeat receive socket list")
>>>>> + ,errdetail("select() failed with reason %m")));
>>>>> + }
>>>>> + else
>>>>> + {
>>>>> + ereport(ERROR,
>>>>> + (errmsg("failed to get socket data from heartbeat receive socket list")
>>>>> + ,errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
>>>>>
>>>>> We usually place "," at the end of the line, not the beginning of the
>>>>> line. I mean following style is better.
>>>>>
>>>>> + (errmsg("failed to get socket data from heartbeat receive socket list"),
>>>>> + errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
>>>>>
>>>>> > ---
>>>>> >
>>>>> > Regards
>>>>> > Kwangwon Seo
>>>>> >
>>>>> > On Thursday, August 8th, 2024 at 10:35 AM, Tatsuo Ishii ishii at postgresql.org wrote:
>>>>> >
>>>>> > > > > I think you should give a version number to your patch using "-v"
>>>>> > > > > option of git format-patch command.
>>>>> > > > >
>>>>> > > > > I have quick looked through the patch.
>>>>> > > > >
>>>>> > > > > t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch
>>>>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343: trailing whitespace.
>>>>> > > > >
>>>>> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355: trailing whitespace.
>>>>> > > > >
>>>>> > > > > warning: 2 lines add whitespace errors.
>>>>> > > > >
>>>>> > > > > Please remove unnecessary white spaces.
>>>>> > > > >
>>>>> > > > > Also I got compiler warnings:
>>>>> > > > >
>>>>> > > > > In file included from watchdog.c:51:
>>>>> > > > > watchdog.c: In function 。watchdog_main「:
>>>>> > > > > ../../src/include/utils/elog.h:199:4: warning: 。port「 may be used uninitialized in this function [-Wmaybe-uninitialized]
>>>>> > > > > 199 | errfinish rest; \
>>>>> > > > > | ^~~~~~~~~
>>>>> > > > > watchdog.c:3439:8: note: 。port「 was declared here
>>>>> > > > > 3439 | int port;
>>>>> > > > > | ^~~~
>>>>> > > > >
>>>>> > > > > This is caused by following code snipet:
>>>>> > > > >
>>>>> > > > > + switch(ss.ss_family)
>>>>> > > > > + {
>>>>> > > > > + case AF_INET:
>>>>> > > > > + inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr, conn->addr, addrlen);
>>>>> > > > > + port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);
>>>>> > > > > + break;
>>>>> > > > > +
>>>>> > > > > + case AF_INET6:
>>>>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>>>>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>>>>> > > > > + break;
>>>>> > > > > + }
>>>>> > > > > +
>>>>> > > > > + ereport(LOG,
>>>>> > > > > + (errmsg("new watchdog node connection is received from \"%s:%d\"",conn->addr, port)));
>>>>> > > > > +
>>>>> > > > >
>>>>> > > > > Here the switch statement does not have "default" and if ss.ss_family
>>>>> > > > > is not either AF_INET or AF_INET6, the value of "port" will be
>>>>> > > > > undefined. That's why the compiler complains. I think you should have
>>>>> > > > > "default" something like:
>>>>> > > > >
>>>>> > > > > + case AF_INET6:
>>>>> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>>>>> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>>>>> > > > > + break;
>>>>> > > > > +
>>>>> > > > > default:
>>>>> > > > > ereport(ERROR,
>>>>> > > > > (errmsg("unknown ss_family")));
>>>>> > > > > break;
>>>>> > > > >
>>>>> > > > > BTW the indentations after CASE should be added. See other CASE
>>>>> > > > > statements in pgpool.
>>>>> > > > >
>>>>> > > > > + else if (sel_ret == -1)
>>>>> > > > > + {
>>>>> > > > > + ereport(ERROR,
>>>>> > > > > + (errmsg("failed to get socket data from socket list")
>>>>> > > > > + ,errdetail("select() failed with errno %d", errno)));
>>>>> > > > >
>>>>> > > > > You can use %m" here without using errno. See codes using %m in pgpool.
>>>>> > > > >
>>>>> > > > > + }
>>>>> > > > > + else
>>>>> > > > > + {
>>>>> > > > > + /* unreachable. */
>>>>> > > > > + Assert(FALSE);
>>>>> > > > > + }
>>>>> > > > >
>>>>> > > > > Assert cannot be used in pgpool for now.
>>>>> > > > >
>>>>> > > > > Regarding the regression test, I got a timeout error with
>>>>> > > > > 028.watchdog_enable_consensus_with_half_votes test. Have you
>>>>> > > > > succeeded?
>>>>> > > >
>>>>> > > > In addition, 036.trusted_servers also timed out here.
>>>>> > >
>>>>> > > I am looking into this a little bit. The time out was caused by below
>>>>> > > in the test script:
>>>>> > >
>>>>> > > function wait_for_watchdog_startup
>>>>> > > {
>>>>> > > while :
>>>>> > > do
>>>>> > > grep "lifecheck started" $log >/dev/null
>>>>> > >
>>>>> > > if [ $? = 0 ];then
>>>>> > > break;
>>>>> > > fi
>>>>> > > sleep 1
>>>>> > > done
>>>>> > > }
>>>>> > >
>>>>> > > It looks for "lifecheck started" in the pgpool log. The message is
>>>>> > > supposed to be issued by lifecheck_main().
>>>>> > >
>>>>> > > /* wait until ready to go */
>>>>> > > while (WD_OK != is_wd_lifecheck_ready())
>>>>> > > {
>>>>> > > sleep(pool_config->wd_interval * 10);
>>>>> > >
>>>>> > > }
>>>>> > >
>>>>> > > ereport(LOG,
>>>>> > > (errmsg("watchdog: lifecheck started")));
>>>>> > >
>>>>> > > Problem is, is_wd_lifecheck_ready() does not return WD_OK. In is_wd_lifecheck_ready(),
>>>>> > >
>>>>> > > ereport(DEBUG1,
>>>>> > > (errmsg("watchdog checking life check is ready"),
>>>>> > > errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal yet",
>>>>> > > i, node->hostName, node->pgpoolPort)));
>>>>> > >
>>>>> > > So I set log_min_messages to debug1 and got following in the log file.
>>>>> > >
>>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
>>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:1 at "localhost:50004" has not send the heartbeat signal yet
>>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
>>>>> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:2 at "localhost:50008" has not send the heartbeat signal yet
>>>>> > >
>>>>> > > It seems other watchdog's lifecheck process do not send the hearbeat
>>>>> > > signal, or hearbeat receiver could not received the signal?
>>>>> > >
>>>>> > > Best reagards,
>>>>> > > --
>>>>> > > Tatsuo Ishii
>>>>> > > SRA OSS K.K.
>>>>> > > English: http://www.sraoss.co.jp/index_en/
>>>>> > > Japanese:http://www.sraoss.co.jp
>>> _______________________________________________
>>> pgpool-hackers mailing list
>>> pgpool-hackers at pgpool.net
>>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers


More information about the pgpool-hackers mailing list