[pgpool-general: 8110] Re: Possible race condition during startup causing node to enter network isolation

Mon May 2 08:56:35 JST 2022

Hello,

> Any thoughts on this issue? We are still experiencing intermittent test
> failures due to this issue.

Another developer who introduced Watchdog Macanisam to Pgpool-II fixed this issue
in the commit below:

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commitdiff;h=3337aa8c07cd07cdbc238a5a154a8c4d8dbe0472

It will be released in the next minor release scheduled on May 19th.

> On Fri, Apr 1, 2022 at 9:03 AM Emond Papegaaij <emond.papegaaij at gmail.com>
> wrote:
> 
> > Hi,
> >
> > Unfortunately, this issue still pops up every once in a while. We are now
> > running 4.3.1. In our latest failure, the issue occured in a simple restart
> > of all services on node 1, with node 3 being the leader. Pgpool on node 1
> > tries to rejoin the cluster, but gets rejected over and over again. Node 3
> > reports that 'only life-check process can mark this node alive again'. I've
> > attached the full logs of both node 1 and 3. The configuration hasn't
> > changed since last time.
> >
> > Best regards,
> > Emond
> >
> > On Mon, Nov 29, 2021 at 4:12 PM Emond Papegaaij <emond.papegaaij at gmail.com>
> > wrote:
> >
> >> On Mon, Nov 29, 2021 at 3:55 PM Bo Peng <pengbo at sraoss.co.jp> wrote:
> >>
> >>> Thank you for your test.
> >>>
> >>> Because we did some bug fix for watchdog since 4.2.4, it might be an
> >>> upgrade issue.
> >>> If you can reproduce this issue in 4.2.6, could you share the pgpool
> >>> logs of all nodes?
> >>>
> >>
> >> I'll continue to monitor the tests. If one fails again, I'll share the
> >> logs. As I said, this could take some time, because the failure only occurs
> >> about once a week. Thanks for your help so far.
> >>
> >> Best regards,
> >> Emond
> >>
> >

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS, Inc. Japan
http://www.sraoss.co.jp/