[pgpool-hackers: 4511] Re: Watchdog and IPv6

Kwangwon Seo keiseo at protonmail.com
Fri Aug 23 11:27:16 JST 2024


ohhh

What a good news! glad to hear that!!

details below.


-- conventions

Yes. following the coding convention is an expression of politeness.

and.. postgresql put a comma at the end and pgpool also does the same.


-- about development guidance

LINK: https://www.pgpool.net/mediawiki/index.php/Development 

according to this development guidance, It looks like.. SGML documentation is also needed for the committable patch.


-- patch attachment

Just in case, I submit the 5th patch which only changes the comma location(v5-0001).

And the other patch(v5-0002) is about documentation updates.

but I'm not sure about whether this is accptable or not.

If those things are not fit the situation, you can ignore this.




Regards
Kwangwon Seo

Bitnine Global Inc.
www.bitnine.net

On Thursday, August 22nd, 2024 at 5:13 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> Hi Kwangwon,
> 
> Thank you for the updated patches.
> 
> > Thanks for all your considerate reviewing and attention. (really appreciate it)
> > 
> > I had started fixing it just after receiving the first mail.
> > 
> > I also want to reply with actual patch files with some progress so that was the reason of big delay in responding to the mail. sorry about that.
> > 
> > anyway, about some progress.
> > 
> > -- following advices.
> > indentation, %m, compile warning, and so on..
> > Yes, I checked again so as try to not to make a mistake again...
> 
> > -- about regression tests
> > 028.watchdog_enable_consensus_with_half_votes
> > yes. timeout.. your tip is exactly the point. so I could fix that.
> 
> 
> Great. I have succeeded all regression tests including 028 this time.
> 
> > 036.trusted_servers...
> > yes. It's covered now.
> 
> 
> This suceeded too.
> 
> > 004, 011, 012, 013, 014, 015 is covered both localhost and ::1.
> > 
> > -- some stories (you can ignore this)
> > Recently, I reinstalled my devices and just installed ssh library without actual executable binary.
> > regress 28 executes pgpool_recovery() and inside the script, starting with the ssh command.
> > That was one of the reasons for spending some time.
> > 
> > And the other reason was, I assigned the wrong port number. what a silly me.
> > It was just one line. but I spent too much time trying to find the reason for it. C APIs and Memorycontext mechanism. that's far away from the actual reason.
> > 
> > -- about attachment (patch files)
> > whitespaces checked and created with -v option. This is the 4th submission.
> > 
> > v4-0001-feature-ipv6-for-watchdog.patch
> > Actual patch file. some detailed messages for additional supplementary inside. It's just for an explanation so you can erase that message.
> > 
> > v4-0002-test-localhost-to-1.patch
> > It's just for test purposes. replace address from 'localhost' to '::1' on the regression tests. You can ignore this patch file.
> 
> 
> I have tested the patch. It worked well here.
> 
> > well, anyway, It's not finished until actually finished.
> > 
> > So how about this patch? you can say anything at ease..
> 
> 
> I forgot to say in the last email. It's very subtle point...
> 
> + ereport(ERROR,
> + (errmsg("failed to get socket data from heartbeat receive socket list")
> + ,errdetail("select() failed with reason %m")));
> + }
> + else
> + {
> + ereport(ERROR,
> + (errmsg("failed to get socket data from heartbeat receive socket list")
> + ,errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
> 
> We usually place "," at the end of the line, not the beginning of the
> line. I mean following style is better.
> 
> + (errmsg("failed to get socket data from heartbeat receive socket list"),
> + errdetail("select() got timeout, exceed %d sec(s)", timeout_sec)));
> 
> > ---
> > 
> > Regards
> > Kwangwon Seo
> > 
> > On Thursday, August 8th, 2024 at 10:35 AM, Tatsuo Ishii ishii at postgresql.org wrote:
> > 
> > > > > I think you should give a version number to your patch using "-v"
> > > > > option of git format-patch command.
> > > > > 
> > > > > I have quick looked through the patch.
> > > > > 
> > > > > t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch
> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343: trailing whitespace.
> > > > > 
> > > > > /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355: trailing whitespace.
> > > > > 
> > > > > warning: 2 lines add whitespace errors.
> > > > > 
> > > > > Please remove unnecessary white spaces.
> > > > > 
> > > > > Also I got compiler warnings:
> > > > > 
> > > > > In file included from watchdog.c:51:
> > > > > watchdog.c: In function 。watchdog_main「:
> > > > > ../../src/include/utils/elog.h:199:4: warning: 。port「 may be used uninitialized in this function [-Wmaybe-uninitialized]
> > > > > 199 | errfinish rest; \
> > > > > | ^~~~~~~~~
> > > > > watchdog.c:3439:8: note: 。port「 was declared here
> > > > > 3439 | int port;
> > > > > | ^~~~
> > > > > 
> > > > > This is caused by following code snipet:
> > > > > 
> > > > > + switch(ss.ss_family)
> > > > > + {
> > > > > + case AF_INET:
> > > > > + inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr, conn->addr, addrlen);
> > > > > + port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);
> > > > > + break;
> > > > > +
> > > > > + case AF_INET6:
> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
> > > > > + break;
> > > > > + }
> > > > > +
> > > > > + ereport(LOG,
> > > > > + (errmsg("new watchdog node connection is received from \"%s:%d\"",conn->addr, port)));
> > > > > +
> > > > > 
> > > > > Here the switch statement does not have "default" and if ss.ss_family
> > > > > is not either AF_INET or AF_INET6, the value of "port" will be
> > > > > undefined. That's why the compiler complains. I think you should have
> > > > > "default" something like:
> > > > > 
> > > > > + case AF_INET6:
> > > > > + inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
> > > > > + port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
> > > > > + break;
> > > > > +
> > > > > default:
> > > > > ereport(ERROR,
> > > > > (errmsg("unknown ss_family")));
> > > > > break;
> > > > > 
> > > > > BTW the indentations after CASE should be added. See other CASE
> > > > > statements in pgpool.
> > > > > 
> > > > > + else if (sel_ret == -1)
> > > > > + {
> > > > > + ereport(ERROR,
> > > > > + (errmsg("failed to get socket data from socket list")
> > > > > + ,errdetail("select() failed with errno %d", errno)));
> > > > > 
> > > > > You can use %m" here without using errno. See codes using %m in pgpool.
> > > > > 
> > > > > + }
> > > > > + else
> > > > > + {
> > > > > + /* unreachable. */
> > > > > + Assert(FALSE);
> > > > > + }
> > > > > 
> > > > > Assert cannot be used in pgpool for now.
> > > > > 
> > > > > Regarding the regression test, I got a timeout error with
> > > > > 028.watchdog_enable_consensus_with_half_votes test. Have you
> > > > > succeeded?
> > > > 
> > > > In addition, 036.trusted_servers also timed out here.
> > > 
> > > I am looking into this a little bit. The time out was caused by below
> > > in the test script:
> > > 
> > > function wait_for_watchdog_startup
> > > {
> > > while :
> > > do
> > > grep "lifecheck started" $log >/dev/null
> > > 
> > > if [ $? = 0 ];then
> > > break;
> > > fi
> > > sleep 1
> > > done
> > > }
> > > 
> > > It looks for "lifecheck started" in the pgpool log. The message is
> > > supposed to be issued by lifecheck_main().
> > > 
> > > /* wait until ready to go */
> > > while (WD_OK != is_wd_lifecheck_ready())
> > > {
> > > sleep(pool_config->wd_interval * 10);
> > > 
> > > }
> > > 
> > > ereport(LOG,
> > > (errmsg("watchdog: lifecheck started")));
> > > 
> > > Problem is, is_wd_lifecheck_ready() does not return WD_OK. In is_wd_lifecheck_ready(),
> > > 
> > > ereport(DEBUG1,
> > > (errmsg("watchdog checking life check is ready"),
> > > errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal yet",
> > > i, node->hostName, node->pgpoolPort)));
> > > 
> > > So I set log_min_messages to debug1 and got following in the log file.
> > > 
> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:1 at "localhost:50004" has not send the heartbeat signal yet
> > > 2024-08-08 10:24:05.651: life_check pid 915179: DEBUG: watchdog checking life check is ready
> > > 2024-08-08 10:24:05.651: life_check pid 915179: DETAIL: pgpool:2 at "localhost:50008" has not send the heartbeat signal yet
> > > 
> > > It seems other watchdog's lifecheck process do not send the hearbeat
> > > signal, or hearbeat receiver could not received the signal?
> > > 
> > > Best reagards,
> > > --
> > > Tatsuo Ishii
> > > SRA OSS K.K.
> > > English: http://www.sraoss.co.jp/index_en/
> > > Japanese:http://www.sraoss.co.jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v5-0001-feature-ipv6-for-watchdog.patch
Type: text/x-patch
Size: 30784 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240823/03dd7c06/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v5-0002-Doc-mention-about-watchdog-ipv6-supporting.patch
Type: text/x-patch
Size: 3719 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240823/03dd7c06/attachment-0003.bin>


More information about the pgpool-hackers mailing list