<div dir="ltr"><div><div><div>Hi 

Tatsuo!<br><br></div>Thank you for testing.  <br><br>In your example, I mean what if now 

localhost 11002 - the old primary postgresql - recovers, noticing standby is down and hence starts to serve as the primary with data0. Later, as the old standby recovers, it must follow the old primary as standby, therefore loses all the data it updated to data1 while the old primary is down.<br><br></div>Best Regards,<br></div>  Zhaoxun<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 6, 2023 at 1:55 PM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> Suppose we have two servers, under extreme circumstances two may both fail.<br>

> Now that we face 4 possibilities:<br>

> <br>

> 1) Master fail -> Standby self-promote -> Standby fail -> old Master<br>

> recover ?<br>

> 2) Master fail -> Standby self-promote -> Standby fail -> Standby and new<br>

> Master recover?<br>

> 3) Standby fail -> Master fail -> Standby Recover?<br>

> 4) Standby fail -> Master fail -> Master recover?<br>

> <br>

> 1 and 3 are especially hazardous because the only recovered server may view<br>

> itself as the current master and hence lose data during its failure time. I<br>

> believe when only one server wakes up it should stay and wait for the other<br>

> server to recover before negotiating who should be the new master.<br>

> <br>

> Does pgpool have such a mechanism?<br>

<br>

For #1 yes.<br>

<br>

# initial state: primary and standby are up.<br>

$ pcp_node_info -w -p 11001<br>

localhost 11002 1 0.500000 waiting up primary primary 0 none none 2023-04-06 14:37:42<br>

localhost 11003 1 0.500000 waiting up standby standby 0 streaming async 2023-04-06 14:37:42<br>

<br>

# master fail. stop the primary.<br>

$ pg_ctl -D data0 stop<br>

waiting for server to shut down.... done<br>

server stopped<br>

<br>

# the primary down and the standby self-promote.<br>

$ pcp_node_info -w -p 11001<br>

localhost 11002 3 0.500000 down down standby unknown 0 none none 2023-04-06 14:38:27<br>

localhost 11003 1 0.500000 waiting up primary primary 0 none none 2023-04-06 14:38:27<br>

<br>

# the (old) standby fail.<br>

$ pg_ctl -D data1 stop<br>

waiting for server to shut down.... done<br>

server stopped<br>

$ pcp_node_info -w -p 11001<br>

pcp_node_info -w -p 11001<br>

localhost 11002 3 0.500000 down down standby unknown 0 none none 2023-04-06 14:38:27<br>

localhost 11003 3 0.500000 down down standby unknown 0 none none 2023-04-06 14:38:55<br>

<br>

# now pgpool does not accept any connection from clients.<br>

$ psql -p 11000 test<br>

psql: error: connection to server on socket "/tmp/.s.PGSQL.11000" failed: ERROR:  pgpool is not accepting any new connections<br>

DETAIL:  all backend nodes are down, pgpool requires at least one valid node<br>

HINT:  repair the backend nodes and restart pgpool<br>

<br>

#2 is basically same because after both the primary and the stabdby go<br>

 down, pgpool won't accept connection from clients.<br>

<br>

For #3 and #4, I am not sure what you mean. Maybe you mean the case<br>

when no failover command is configured (thus no self-promote)? If so,<br>

the result is same as #1 and #2.<br>

<br>

Best reagards,<br>

--<br>

Tatsuo Ishii<br>

SRA OSS LLC<br>

English: <a href="http://www.sraoss.co.jp/index_en/" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en/</a><br>

Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>

</blockquote></div>