[pgpool-hackers: 1826] patch of wd_lifecheck.c for stabilization

Fri Sep 30 11:30:05 JST 2016

Hi all,
I created a patch file of wd_lifecheck.c for stabilizing.
When I use pgpool-II 3.5.4 on the vm server,  life check often failed.
Cause the vm server has fast clock CPU's and slow network periphelal.
In this case the watchdog process had often failed to connect with other
nodes.
Therefore, I add sleep() instead of usleep() and retry routine in network
connection.

In my case, the watchdog process was stabilized by this fix.
I'd like to merge this patch when you confirm that there is no problem.

Regards,

--- src/watchdog/wd_lifecheck.c.org     2016-09-30 10:38:42.647168825 +0900
+++ src/watchdog/wd_lifecheck.c 2016-09-30 10:51:06.402749336 +0900
@@ -804,7 +804,7 @@ check_pgpool_status_by_query(void)
                rc = pthread_join(thread[i], (void **)&result);
                if ((rc != 0) && (errno == EINTR))
                {
-                       usleep(100);
+                       sleep(1);
                        continue;
                }

@@ -953,11 +953,19 @@ wd_check_heartbeat(LifeCheckNode* node)
 static int
 wd_ping_pgpool(LifeCheckNode* node)
 {
-       PGconn * conn;
+       PGconn * conn = NULL;
+       int cnt = 0;

-       conn = create_conn(node->hostName, node->pgpoolPort);
-       if (conn == NULL)
-               return WD_NG;
+       while (conn == NULL)
+       {
+               conn = create_conn(node->hostName, node->pgpoolPort);
+               if ((conn == NULL) && (cnt > pool_config->wd_life_point))
+               {
+                       return WD_NG;
+               }
+               sleep(1);
+               cnt ++;
+       }
        return ping_pgpool(conn);
 }

--
At.Mitani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20160930/2a9e1247/attachment.html>