[pgpool-hackers: 4503] Re: Watchdog and IPv6

Tatsuo Ishii ishii at postgresql.org
Thu Aug 8 10:35:13 JST 2024


>> I think you should give a version number to your patch using "-v"
>> option of git format-patch command.
>> 
>> I have quick looked through the patch.
>> 
>> 
>> t-ishii$ git apply ~/0001-feature-ipv6-for-watchdog.patch 
>> /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:343: trailing whitespace.
>> 	
>> /home/t-ishii/0001-feature-ipv6-for-watchdog.patch:355: trailing whitespace.
>> 	
>> warning: 2 lines add whitespace errors.
>> 
>> Please remove unnecessary white spaces.
>> 
>> Also I got compiler warnings:
>> 
>> In file included from watchdog.c:51:
>> watchdog.c: In function 。watchdog_main「:
>> ../../src/include/utils/elog.h:199:4: warning: 。port「 may be used uninitialized in this function [-Wmaybe-uninitialized]
>>   199 |    errfinish rest; \
>>       |    ^~~~~~~~~
>> watchdog.c:3439:8: note: 。port「 was declared here
>>  3439 |    int port;
>>       |        ^~~~
>> 
>> This is caused by following code snipet:
>> 
>> +				switch(ss.ss_family)
>> +				{
>> +					case AF_INET:
>> +					inet_ntop(AF_INET, &((struct sockaddr_in*) &ss)->sin_addr, conn->addr, addrlen);
>> +					port = ntohs(((struct sockaddr_in*)(&ss))->sin_port);
>> +					break;
>> +
>> +					case AF_INET6:
>> +					inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>> +					port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>> +					break;
>> +				}
>> +
>> +				ereport(LOG,
>> +						(errmsg("new watchdog node connection is received from \"%s:%d\"",conn->addr, port)));
>> +
>> 
>> Here the switch statement does not have "default" and if ss.ss_family
>> is not either AF_INET or AF_INET6, the value of "port" will be
>> undefined. That's why the compiler complains. I think you should have
>> "default" something like:
>> 
>> +					case AF_INET6:
>> +					inet_ntop(AF_INET6, &((struct sockaddr_in6*) &ss)->sin6_addr, conn->addr, addrlen);
>> +					port = ntohs(((struct sockaddr_in6*)(&ss))->sin6_port);
>> +					break;
>> +
>> 					default:
>> 						ereport(ERROR,
>> 							(errmsg("unknown ss_family")));
>> 						break;
>> 
>> BTW the indentations after CASE should be added. See other CASE
>> statements in pgpool.
>> 
>> +	else if (sel_ret == -1)
>> +	{
>> +		ereport(ERROR,
>> +				(errmsg("failed to get socket data from socket list")
>> +				 ,errdetail("select() failed with errno %d", errno)));
>> 
>> You can use %m" here without using errno. See codes using %m in pgpool.
>> 
>> +	}
>> +	else
>> +	{
>> +		/* unreachable. */
>> +		Assert(FALSE);
>> +	}
>> 
>> Assert cannot be used in pgpool for now.
>> 
>> Regarding the regression test, I got a timeout error with
>> 028.watchdog_enable_consensus_with_half_votes test. Have you
>> succeeded?
> 
> In addition, 036.trusted_servers also timed out here.

I am looking into this a little bit. The time out was caused by below
in the test script:

function wait_for_watchdog_startup
{
    while :
    do
	grep "lifecheck started" $log >/dev/null
	if [ $? = 0 ];then
	    break;
	fi
	sleep 1
    done
}

It looks for "lifecheck started" in the pgpool log. The message is
supposed to be issued by lifecheck_main().

	/* wait until ready to go */
	while (WD_OK != is_wd_lifecheck_ready())
	{
		sleep(pool_config->wd_interval * 10);
	}

	ereport(LOG,
			(errmsg("watchdog: lifecheck started")));

Problem is, is_wd_lifecheck_ready() does not return WD_OK. In is_wd_lifecheck_ready(),

				ereport(DEBUG1,
						(errmsg("watchdog checking life check is ready"),
						 errdetail("pgpool:%d at \"%s:%d\" has not send the heartbeat signal yet",
								   i, node->hostName, node->pgpoolPort)));

So I set log_min_messages to debug1 and got following in the log file.

2024-08-08 10:24:05.651: life_check pid 915179: DEBUG:  watchdog checking life check is ready
2024-08-08 10:24:05.651: life_check pid 915179: DETAIL:  pgpool:1 at "localhost:50004" has not send the heartbeat signal yet
2024-08-08 10:24:05.651: life_check pid 915179: DEBUG:  watchdog checking life check is ready
2024-08-08 10:24:05.651: life_check pid 915179: DETAIL:  pgpool:2 at "localhost:50008" has not send the heartbeat signal yet

It seems other watchdog's lifecheck process do not send the hearbeat
signal, or hearbeat receiver could not received the signal?

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


More information about the pgpool-hackers mailing list