[pgpool-general: 7506] Possible pgpool 4.1.4 failover auto_failback race condition

Wed Apr 14 14:54:13 JST 2021

Hi,

I believe I’ve found a race condition with auto_failback.

In my test environment I have 3 servers each running both pgpool and postgres.

I simulate a network failure with iptables rules on one node.

I start the test with the following state:
pgpool primary: node 2
postgres primary: node 0

When I fail node 0, in order to trigger failover with follow_master (4.1.x still), I find that most of the time node 2 is reattached before follow_master gets a chance to run for that node. It is of course set to CON_DOWN when the failover is triggered, and I would expect it to stay in that state until follow_master reattaches it.

I believe, though I’m not 100% certain, that sometimes this comes from node 1.

Is this likely a configuration problem, or, is this a bug of some kind? I had a quick look at the code, and don’t see any changes that would impact this since 4.1.4 - but I am of course happy to be wrong about that !

We have the following set:
auto_failback = on
auto_failback_interval = 60

--
Nathan Ward