[pgpool-hackers: 4513] Re: Watchdog and IPv6

Muhammad Usama m.usama at gmail.com
Mon Aug 26 17:17:37 JST 2024


Hi Kwangwon,

Thanks for the patch. Overall, it looks good and accomplishes what it's
supposed to do.

I just have a couple of minor suggestions regarding the error messages in
the `wd_create_hb_recv_socket` and `wd_create_recv_socket` functions, where
you’re raising an error if there’s no local address available:

ereport(ERROR, (errmsg("no sockets can be created because no available
local address with port:%d", port),
                errdetail("getaddrinfo() result is empty")));

It might be helpful to specify the module that’s attempting to create the
socket.
For example, the above error message in wd_create_recv_socket() could be
modified as follows:

ereport(ERROR,
   (errmsg("failed to create watchdog receive socket"),
    errdetail("getaddrinfo() result is empty: no sockets can be created
because no available local address with port:%d", port))));

Other than that, the patch looks great.

Thanks
Best regards
Muhammad Usama


On Sat, Aug 24, 2024 at 9:43 AM Tatsuo Ishii <ishii at postgresql.org> wrote:

> Hi Kwangwon,
>
> Thank you for the quick update. I think your patch is getting closer
> to commitable state.
>
> Usama,
>
> As you are the authority of watchdog, I would like to ask you to
> review the patch.
>
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
> From: Kwangwon Seo <keiseo at protonmail.com>
> Subject: Re: [pgpool-hackers: 4502] Re: Watchdog and IPv6
> Date: Fri, 23 Aug 2024 02:27:16 +0000
> Message-ID:
> <NZs60H0XR7JUEi50CZG8W3ijdSb5KrTn3lryKQFC-5UASWAiCMW6v5_hYU99B9Op4SkDYRsuraulgOrGa8fIlPpZNaqREPr-iqxjcozx7YE=@
> protonmail.com>
>
> > ohhh
> >
> > What a good news! glad to hear that!!
> >
> > details below.
> >
> >
> > -- conventions
> >
> > Yes. following the coding convention is an expression of politeness.
> >
> > and.. postgresql put a comma at the end and pgpool also does the same.
> >
> >
> > -- about development guidance
> >
> > LINK: https://www.pgpool.net/mediawiki/index.php/Development
> >
> > according to this development guidance, It looks like.. SGML
> documentation is also needed for the committable patch.
> >
> >
> > -- patch attachment
> >
> > Just in case, I submit the 5th patch which only changes the comma
> location(v5-0001).
> >
> > And the other patch(v5-0002) is about documentation updates.
> >
> > but I'm not sure about whether this is accptable or not.
> >
> > If those things are not fit the situation, you can ignore this.
> >
> >
> >
> >
> > Regards
> > Kwangwon Seo
> >
> > Bitnine Global Inc.
> > www.bitnine.net
> >
> > On Thursday, August 22nd, 2024 at 5:13 PM, Tatsuo Ishii <
> ishii at postgresql.org> wrote:
> >
> >> Hi Kwangwon,
> >>
> >> Thank you for the updated patches.
> >>
> >> > Thanks for all your considerate reviewing and attention. (really
> appreciate it)
> >> >
> >> > I had started fixing it just after receiving the first mail.
> >> >
> >> > I also want to reply with actual patch files with some progress so
> that was the reason of big delay in responding to the mail. sorry about
> that.
> >> >
> >> > anyway, about some progress.
> >> >
> >> > -- following advices.
> >> > indentation, %m, compile warning, and so on..
> >> > Yes, I checked again so as try to not to make a mistake again...
> >>
> >> > -- about regression tests
> >> > 028.watchdog_enable_consensus_with_half_votes
> >> > yes. timeout.. your tip is exactly the point. so I could fix that.
> >>
> >>
> >> Great. I have succeeded all regression tests including 028 this time.
> >>
> >> > 036.trusted_servers...
> >> > yes. It's covered now.
> >>
> >>
> >> This suceeded too.
> >>
> >> > 004, 011, 012, 013, 014, 015 is covered both localhost and ::1.
> >> >
> >> > -- some stories (you can ignore this)
> >> > Recently, I reinstalled my devices and just installed ssh library
> without actual executable binary.
> >> > regress 28 executes pgpool_recovery() and inside the script, starting
> with the ssh command.
> >> > That was one of the reasons for spending some time.
> >> >
> >> > And the other reason was, I assigned the wrong port number. what a
> silly me.
> >> > It was just one line. but I spent too much time trying to find the
> reason for it. C APIs and Memorycontext mechanism. that's far away from the
> actual reason.
> >> >
> >> > -- about attachment (patch files)
> >> > whitespaces checked and created with -v option. This is the 4th
> submission.
> >> >
> >> > v4-0001-feature-ipv6-for-watchdog.patch
> >> > Actual patch file. some detailed messages for additional
> supplementary inside. It's just for an explanation so you can erase that
> message.
> >> >
> >> > v4-0002-test-localhost-to-1.patch
> >> > It's just for test purposes. replace address from 'localhost' to
> '::1' on the regression tests. You can ignore this patch file.
> >>
> >>
> >> I have tested the patch. It worked well here.
> >>
> >> > well, anyway, It's not finished until actually finished.
> >> >
> >> > So how about this patch? you can say anything at ease..
> >>
> >>
> >> I forgot to say in the last email. It's very subtle point...
> >>
> >> + ereport(ERROR,
> >> + (errmsg("failed to get socket data from heartbeat receive socket
> list")
> >> + ,errdetail("select() failed with reason %m")));
> >> + }
> >> + else
> >> + {
> >> + ereport(ERROR,
> >> + (errmsg("failed to get socket data from heartbeat receive socket
> list")
> >> + ,errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
> >>
> >> We usually place "," at the end of the line, not the beginning of the
> >> line. I mean following style is better.
> >>
> >> + (errmsg("failed to get socket data from heartbeat receive socket
> list"),
> >> + errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
> >>
> >> > ---
> >> >
> >> > Regards
> >> > Kwangwon Seo
> >> >
> >> > On Thursday, August 8th, 2024 at 10:35 AM, Tatsuo Ishii
> ishii at postgresql.org wrote:
> >> >
> >> > > > > I think you should give a version number to your patch using
> "-v"
> >> > > > > option of git format-patch command.
> >> > > > >
> >> > > > > I have quick looked through the patch.
> >> > > > >
> >> > > > > t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch
> >> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343:
> trailing whitespace.
> >> > > > >
> >> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355:
> trailing whitespace.
> >> > > > >
> >> > > > > warning: 2 lines add whitespace errors.
> >> > > > >
> >> > > > > Please remove unnecessary white spaces.
> >> > > > >
> >> > > > > Also I got compiler warnings:
> >> > > > >
> >> > > > > In file included from watchdog.c:51:
> >> > > > > watchdog.c: In function 。watchdog_main「:
> >> > > > > ../../src/include/utils/elog.h:199:4: warning: 。port「 may be
> used uninitialized in this function [-Wmaybe-uninitialized]
> >> > > > > 199 | errfinish rest; \
> >> > > > > | ^~~~~~~~~
> >> > > > > watchdog.c:3439:8: note: 。port「 was declared here
> >> > > > > 3439 | int port;
> >> > > > > | ^~~~
> >> > > > >
> >> > > > > This is caused by following code snipet:
> >> > > > >
> >> > > > > + switch(ss.ss_family)
> >> > > > > + {
> >> > > > > + case AF_INET:
> >> > > > > + inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr,
> conn->addr, addrlen);
> >> > > > > + port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);
> >> > > > > + break;
> >> > > > > +
> >> > > > > + case AF_INET6:
> >> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr,
> conn->addr, addrlen);
> >> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
> >> > > > > + break;
> >> > > > > + }
> >> > > > > +
> >> > > > > + ereport(LOG,
> >> > > > > + (errmsg("new watchdog node connection is received from
> \"%s:%d\"",conn->addr, port)));
> >> > > > > +
> >> > > > >
> >> > > > > Here the switch statement does not have "default" and if
> ss.ss_family
> >> > > > > is not either AF_INET or AF_INET6, the value of "port" will be
> >> > > > > undefined. That's why the compiler complains. I think you
> should have
> >> > > > > "default" something like:
> >> > > > >
> >> > > > > + case AF_INET6:
> >> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr,
> conn->addr, addrlen);
> >> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
> >> > > > > + break;
> >> > > > > +
> >> > > > > default:
> >> > > > > ereport(ERROR,
> >> > > > > (errmsg("unknown ss_family")));
> >> > > > > break;
> >> > > > >
> >> > > > > BTW the indentations after CASE should be added. See other CASE
> >> > > > > statements in pgpool.
> >> > > > >
> >> > > > > + else if (sel_ret == -1)
> >> > > > > + {
> >> > > > > + ereport(ERROR,
> >> > > > > + (errmsg("failed to get socket data from socket list")
> >> > > > > + ,errdetail("select() failed with errno %d", errno)));
> >> > > > >
> >> > > > > You can use %m" here without using errno. See codes using %m in
> pgpool.
> >> > > > >
> >> > > > > + }
> >> > > > > + else
> >> > > > > + {
> >> > > > > + /* unreachable. */
> >> > > > > + Assert(FALSE);
> >> > > > > + }
> >> > > > >
> >> > > > > Assert cannot be used in pgpool for now.
> >> > > > >
> >> > > > > Regarding the regression test, I got a timeout error with
> >> > > > > 028.watchdog_enable_consensus_with_half_votes test. Have you
> >> > > > > succeeded?
> >> > > >
> >> > > > In addition, 036.trusted_servers also timed out here.
> >> > >
> >> > > I am looking into this a little bit. The time out was caused by
> below
> >> > > in the test script:
> >> > >
> >> > > function wait_for_watchdog_startup
> >> > > {
> >> > > while :
> >> > > do
> >> > > grep "lifecheck started" $log >/dev/null
> >> > >
> >> > > if [ $? = 0 ];then
> >> > > break;
> >> > > fi
> >> > > sleep 1
> >> > > done
> >> > > }
> >> > >
> >> > > It looks for "lifecheck started" in the pgpool log. The message is
> >> > > supposed to be issued by lifecheck_main().
> >> > >
> >> > > /* wait until ready to go */
> >> > > while (WD_OK != is_wd_lifecheck_ready())
> >> > > {
> >> > > sleep(pool_config->wd_interval * 10);
> >> > >
> >> > > }
> >> > >
> >> > > ereport(LOG,
> >> > > (errmsg("watchdog: lifecheck started")));
> >> > >
> >> > > Problem is, is_wd_lifecheck_ready() does not return WD_OK. In
> is_wd_lifecheck_ready(),
> >> > >
> >> > > ereport(DEBUG1,
> >> > > (errmsg("watchdog checking life check is ready"),
> >> > > errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal
> yet",
> >> > > i, node->hostName, node->pgpoolPort)));
> >> > >
> >> > > So I set log_min_messages to debug1 and got following in the log
> file.
> >> > >
> >> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog
> checking life check is ready
> >> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:1 at
> "localhost:50004" has not send the heartbeat signal yet
> >> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog
> checking life check is ready
> >> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:2 at
> "localhost:50008" has not send the heartbeat signal yet
> >> > >
> >> > > It seems other watchdog's lifecheck process do not send the hearbeat
> >> > > signal, or hearbeat receiver could not received the signal?
> >> > >
> >> > > Best reagards,
> >> > > --
> >> > > Tatsuo Ishii
> >> > > SRA OSS K.K.
> >> > > English: http://www.sraoss.co.jp/index_en/
> >> > > Japanese:http://www.sraoss.co.jp
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240826/351d71f2/attachment-0001.htm>


More information about the pgpool-hackers mailing list