<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">I am just coming back to this work now after some time on other projects.</div><div class=""><br class=""></div><div class="">I think there are several proposals around improving auto_failback in this thread:</div><div class="">1) my patch</div><div class="">2) Ishii-san’s patch to check follow_primary_count == 0</div><div class="">3) Ishii-san’s proposal to implement a lock to avoid the window where follow_primary might run after checking follow_primary_count</div><div class=""><br class=""></div><div class="">My understanding is we think 1+2 are good, and we can look at 3 if there is still a problem - or perhaps we plan to look at 3 as a future improvement, to avoid a potential problem?</div><div class=""><br class=""></div><div class="">Would you like me to test 1+2?</div><div class=""><br class=""></div><div class=""><div class=""><div><blockquote type="cite" class=""><div class="">On 10/05/2021, at 7:16 PM, Takuma Hoshiai <<a href="mailto:hoshiai.takuma@nttcom.co.jp" class="">hoshiai.takuma@nttcom.co.jp</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta charset="UTF-8" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">Hi,</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">On 2021/05/05 16:03, Tatsuo Ishii wrote:</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">On 27/04/2021, at 10:18 AM, Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" class="">ishii@sraoss.co.jp</a>> wrote:<br class=""><br class="">Hi Nathan,<br class=""><br class=""><blockquote type="cite" class="">Hi,<br class=""><br class="">Sorry about that! I dragged them from the vscode file list directly to Mail - I suspect that that doesn’t work when using remote editing..!<br class=""><br class="">I have attached the files now - does that work?<br class=""></blockquote><br class="">Yes! I will look into the patches. Hoshiai-san, can you please look<br class="">into the patches as well because you are the original author of the<br class="">feature.<br class=""></blockquote><br class="">Hi!<br class=""><br class="">I was wondering if you had time to look at these patches yet? :-)<br class=""><br class="">No rush - just making sure it doesn’t get missed!<br class=""></blockquote>I just have started to look into your patch. Also I was able to<br class="">reproduce the problem.<br class="">1) create 3-node streaming replication cluster.<br class="">pgpool_setup -n 3<br class="">Enable auto_failback and set health_check_period to 1 so that<br class="">auto_failback runs more aggressively.<br class="">auto_failback = on<br class="">health_check_period0 = 1<br class="">health_check_period1 = 1<br class="">health_check_period2 = 1<br class="">start the whole system.<br class="">2) detach node 0 (which is primary)<br class="">3) node 3 becomes down and PostgreSQL won't start<br class="">psql -p 11000 -c "show pool_nodes" test<br class=""> node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change<br class="">---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------<br class=""> 0 | /tmp | 11002 | up | up | 0.333333 | standby | standby | 0 | true | 0 | streaming | async | 2021-05-05 14:10:38<br class=""> 1 | /tmp | 11003 | up | up | 0.333333 | primary | primary | 0 | false | 0 | | | 2021-05-05 14:10:25<br class=""> 2 | /tmp | 11004 | down | down | 0.333333 | standby | unknown | 0 | false | 0 | | | 2021-05-05 14:10:38<br class="">(3 rows)<br class="">The cause of the problem is a race condition between the auto failback<br class="">and follow primary as you and Hoshiai-san suggested. Here are some<br class="">extraction from the pgpool.log.<br class="">$ egrep "degeneration|failback" log/pgpool.log|grep -v child<br class="">2021-05-05 14:10:22: main pid 28630: LOG: starting degeneration. shutdown host /tmp(11002)<br class="">2021-05-05 14:10:25: main pid 28630: LOG: starting follow degeneration. shutdown host /tmp(11002)<br class="">2021-05-05 14:10:25: main pid 28630: LOG: starting follow degeneration. shutdown host /tmp(11004)<span class="Apple-tab-span" style="white-space: pre;"> </span>-- #1<br class="">2021-05-05 14:10:25: health_check2 pid 28673: LOG: request auto failback, node id:2<span class="Apple-tab-span" style="white-space: pre;"> </span>-- #2<br class="">2021-05-05 14:10:25: health_check2 pid 28673: LOG: received failback request for node_id: 2 from pid [28673]<br class="">2021-05-05 14:10:35: main pid 28630: LOG: failback done. reconnect host /tmp(11004)<br class="">2021-05-05 14:10:35: main pid 28630: LOG: failback done. reconnect host /tmp(11002)<span class="Apple-tab-span" style="white-space: pre;"> </span>-- #3<br class="">2021-05-05 14:10:36: pcp_child pid 29035: LOG: starting recovering node 2<br class="">2021-05-05 14:10:36: pcp_child pid 29035: ERROR: node recovery failed, node id: 2 is alive<span class="Apple-tab-span" style="white-space: pre;"> </span>-- #4<br class="">2021-05-05 14:10:38: child pid 29070: LOG: failed to connect to PostgreSQL server by unix domain socket<br class="">2021-05-05 14:10:38: child pid 29070: DETAIL: executing failover on backend<br class="">2021-05-05 14:10:38: main pid 28630: LOG: Pgpool-II parent process has received failover request<br class="">2021-05-05 14:10:38: main pid 28630: LOG: starting degeneration. shutdown host /tmp(11004)<span class="Apple-tab-span" style="white-space: pre;"> </span>-- #5<br class="">1) Follow primary started to shutdown node 2. At this point the<br class=""> backend node 2 was still running.<br class="">2) auto failback found that backend is still alive and send failback<br class=""> request for node 2.<br class="">3) pgpool main process reported that node 2 was back. But actual<br class=""> failback had not done and continued by follow primary command.<br class="">4) follow primary command for node 2 failed because auto failback set<br class=""> the status of node 2 to "up".<br class="">5) Node 2 PostgreSQL was down and health check detected it. Node 2<br class=""> status became down.<br class=""> So if auto failback did not run at #2, the follow primary should have<br class="">been succeeded.<br class="">BTW accidently I and a user found similar situation: conflicting<br class="">concurrent run of detach_false_primary and follow primary command:<br class=""><a href="https://www.pgpool.net/pipermail/pgpool-general/2021-April/007583.html" class="">https://www.pgpool.net/pipermail/pgpool-general/2021-April/007583.html</a><br class="">In the discussion I proposed a patch to prevent the concurrent run of<br class="">detach_false_primary and follow primary command. I think we can apply<br class="">the method to auto_failback as well. Attached is the patch to<br class="">implement it on top of the patch I posted here for the master branch:<br class="">https://www.pgpool.net/pipermail/pgpool-general/2021-April/007594.html<br class="">This patch actually has a small window between here:<br class=""><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span>if (check_failback && !Req_info->switching && slot &&<br class=""><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span>Req_info->follow_primary_count == 0)<br class="">and here:<br class=""><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span>ereport(LOG,<br class=""><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span>(errmsg("request auto failback, node id:%d", node)));<br class=""><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span>/* get current time to use auto_faliback_interval */<br class=""><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span>now = time(NULL);<br class=""><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span>auto_failback_interval = now + pool_config->auto_failback_interval;<br class=""><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span><span class="Apple-tab-span" style="white-space: pre;"> </span>send_failback_request(node, true, REQ_DETAIL_CONFIRMED);<br class="">because after checking Req_info->follow_primary_count, follow primary<br class="">might start just after this. I think the window and probably is<br class="">harmless in the wild. If you think it's not so small, we could do an<br class="">exclusive lock like in detach_false_primary to plug the window.<br class="">Also we have found that detach_false_primary should only run on the<br class="">leader watchdog node. Probably we should consider this for<br class="">auto_failback too.<br class=""></blockquote><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">I have started to look this patch too. But I have failed pgool_setup</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">command in latest master branch with auto_failback_fixes-master.patch.</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">This cause is researching now (It may be my environment is bad).</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">As far as I can see, auto_failback_fixes-master.patch is good.</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">And I think that ishii-san's suggestion makes this patch better.</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Best regards,<br class="">--<br class="">Tatsuo Ishii<br class="">SRA OSS, Inc. Japan<br class="">English:<span class="Apple-converted-space"> </span><a href="http://www.sraoss.co.jp/index_en.php" class="">http://www.sraoss.co.jp/index_en.php</a><br class="">Japanese:<a href="http://www.sraoss.co.jp/" class="">http://www.sraoss.co.jp</a><br class="">_______________________________________________<br class="">pgpool-hackers mailing list<br class=""><a href="mailto:pgpool-hackers@pgpool.net" class="">pgpool-hackers@pgpool.net</a><br class=""><a href="http://www.pgpool.net/mailman/listinfo/pgpool-hackers" class="">http://www.pgpool.net/mailman/listinfo/pgpool-hackers</a><br class=""></blockquote><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">Best Regards,</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">--<span class="Apple-converted-space"> </span></span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">Takuma Hoshiai <</span><a href="mailto:hoshiai.takuma@nttcom.co.jp" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">hoshiai.takuma@nttcom.co.jp</a><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">></span></div></blockquote></div><br class=""></div></div></body></html>