[pgpool-hackers: 3457] Re: Pgpool suside if fails to connect to trusted servers

Thu Oct 10 15:12:00 JST 2019

Hi Siva,

Probably your network team do not notice transient network errors?
Pinging to trusted_serves is a little bit fragile especially you have
only one or two trusted servers because Pgpool-II pings to them
without retries. Thus a transient network problem could make Pgpool-II
believes that trusted servers are down.

I think better way to handle this situation is, giving up relying on
trusted servers and have 3 or 5 Pgpool-II nodes to rely on watchdog's
quorum.

> Hi Tatsuo,
> 
> We are also facing the same issue. But network team is claiming no network
> issue and said pool have to handle the situation.
> We are also in dialama.
> Please suggest what is process to follow to overcome this issues? .
> What are config parameters to configure to withstand this kind of issues? .
> 
> Your suggestions highly appreciated.
> Thanks a lot for your support.
> 
> Regards,
> Siva.
> 
> On Friday, September 13, 2019, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> In the manual:
>>
>> 5.14.3. Upstream server connection
>>
>> ------------------------------------------------------------------
>> trusted_servers (string)
>>
>>     Specifies the list of trusted servers to check the up stream
>>     connections. Each server in the list is required to respond to
>>     ping. Specify a comma separated list of servers such as
>>     "hostA,hostB,hostC". If none of the server are reachable, watchdog
>>     will regard it as failure of the Pgpool-II. Therefore, it is
>>     recommended to specify multiple servers.
>> ------------------------------------------------------------------
>>
>> It's not clear actually what will happen after "watchdog will regard
>> it as failure of the Pgpool-II." What actually happens is, Pgpool-II
>> sucides (see the log below for example).
>>
>> I think we should clearly stat that Pgpool-II will go down if pgpool
>> cannot reach to non of trusted servers, something like:
>>
>> ------------------------------------------------------------------
>> trusted_servers (string)
>>
>>     Specifies the list of trusted servers to check the up stream
>>     connections. Each server in the list is required to respond to
>>     ping. Specify a comma separated list of servers such as
>>     "hostA,hostB,hostC". If none of the server are reachable, watchdog
>>     will regard it as failure of the Pgpool-II and the Pgpool-II will
>>     shut down. Therefore, it is recommended to specify multiple
>>     servers.
>> ------------------------------------------------------------------
>>
>>
>> 2019-09-13 10:35:53: pid 30659: WARNING:  watchdog failed to ping
>> host"192.192.192.192"
>> 2019-09-13 10:35:53: pid 30659: DETAIL:  ping process exits with code: 1
>> 2019-09-13 10:35:53: pid 30659: WARNING:  watchdog lifecheck, failed to
>> connect to any trusted servers
>> 2019-09-13 10:35:53: pid 30659: LOG:  informing the node status change to
>> watchdog
>> 2019-09-13 10:35:53: pid 30659: DETAIL:  node id :0 status = "NODE DEAD"
>> message:"trusted server is unreachable"
>> 2019-09-13 10:35:53: pid 30656: LOG:  new IPC connection received
>> 2019-09-13 10:35:53: pid 30656: LOG:  received node status change ipc
>> message
>> 2019-09-13 10:35:53: pid 30656: DETAIL:  trusted server is unreachable
>> 2019-09-13 10:35:53: pid 30656: WARNING:  watchdog lifecheck reported, we
>> are disconnected from the network
>> 2019-09-13 10:35:53: pid 30656: DETAIL:  changing the state to LOST
>> 2019-09-13 10:35:53: pid 30656: LOG:  watchdog node state changed from
>> [MASTER] to [LOST]
>> 2019-09-13 10:35:53: pid 30656: FATAL:  system has lost the network
>> 2019-09-13 10:35:53: pid 30656: LOG:  Watchdog is shutting down
>> 2019-09-13 10:35:53: pid 30813: LOG:  watchdog: de-escalation started
>> 2019-09-13 10:35:53: pid 30646: LOG:  watchdog child process with pid:
>> 30656 exits with status 768
>> 2019-09-13 10:35:53: pid 30646: FATAL:  watchdog child process exit with
>> fatal error. exiting pgpool-II
>> 2019-09-13 10:35:53: pid 30814: LOG:  setting the local watchdog node name
>> to "localhost:50000 Linux tishii-CFSV7-1"
>> 2019-09-13 10:35:53: pid 30814: LOG:  watchdog cluster is configured with
>> 2 remote nodes
>> 2019-09-13 10:35:53: pid 30814: LOG:  watchdog remote node:0 on
>> localhost:50006
>> 2019-09-13 10:35:53: pid 30814: LOG:  watchdog remote node:1 on
>> localhost:50010
>> 2019-09-13 10:35:53: pid 30814: LOG:  interface monitoring is disabled in
>> watchdog
>> 2019-09-13 10:35:53: pid 30814: LOG:  watchdog node state changed from
>> [DEAD] to [LOADING]
>> 2019-09-13 10:35:53: pid 30814: LOG:  new outbound connection to
>> localhost:50006
>> 2019-09-13 10:35:53: pid 30814: LOG:  new outbound connection to
>> localhost:50010
>> 2019-09-13 10:35:53: pid 30814: LOG:  watchdog node state changed from
>> [LOADING] to [INITIALIZING]
>> 2019-09-13 10:35:53: pid 30814: LOG:  new watchdog node connection is
>> received from "127.0.0.1:60611"
>> 2019-09-13 10:35:53: pid 30814: LOG:  new node joined the cluster
>> hostname:"localhost" port:50006 pgpool_port:50004
>> 2019-09-13 10:35:53: pid 30814: LOG:  new watchdog node connection is
>> received from "127.0.0.1:61123"
>> 2019-09-13 10:35:53: pid 30814: LOG:  new node joined the cluster
>> hostname:"localhost" port:50010 pgpool_port:50008
>> 2019-09-13 10:35:53: pid 30814: LOG:  Watchdog is shutting down
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>>