[pgpool-hackers: 3457] Re: Pgpool suside if fails to connect to trusted servers
Tatsuo Ishii
ishii at sraoss.co.jp
Thu Oct 10 15:12:00 JST 2019
Hi Siva,
Probably your network team do not notice transient network errors?
Pinging to trusted_serves is a little bit fragile especially you have
only one or two trusted servers because Pgpool-II pings to them
without retries. Thus a transient network problem could make Pgpool-II
believes that trusted servers are down.
I think better way to handle this situation is, giving up relying on
trusted servers and have 3 or 5 Pgpool-II nodes to rely on watchdog's
quorum.
> Hi Tatsuo,
>
> We are also facing the same issue. But network team is claiming no network
> issue and said pool have to handle the situation.
> We are also in dialama.
> Please suggest what is process to follow to overcome this issues? .
> What are config parameters to configure to withstand this kind of issues? .
>
> Your suggestions highly appreciated.
> Thanks a lot for your support.
>
> Regards,
> Siva.
>
> On Friday, September 13, 2019, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>
>> In the manual:
>>
>> 5.14.3. Upstream server connection
>>
>> ------------------------------------------------------------------
>> trusted_servers (string)
>>
>> Specifies the list of trusted servers to check the up stream
>> connections. Each server in the list is required to respond to
>> ping. Specify a comma separated list of servers such as
>> "hostA,hostB,hostC". If none of the server are reachable, watchdog
>> will regard it as failure of the Pgpool-II. Therefore, it is
>> recommended to specify multiple servers.
>> ------------------------------------------------------------------
>>
>> It's not clear actually what will happen after "watchdog will regard
>> it as failure of the Pgpool-II." What actually happens is, Pgpool-II
>> sucides (see the log below for example).
>>
>> I think we should clearly stat that Pgpool-II will go down if pgpool
>> cannot reach to non of trusted servers, something like:
>>
>> ------------------------------------------------------------------
>> trusted_servers (string)
>>
>> Specifies the list of trusted servers to check the up stream
>> connections. Each server in the list is required to respond to
>> ping. Specify a comma separated list of servers such as
>> "hostA,hostB,hostC". If none of the server are reachable, watchdog
>> will regard it as failure of the Pgpool-II and the Pgpool-II will
>> shut down. Therefore, it is recommended to specify multiple
>> servers.
>> ------------------------------------------------------------------
>>
>>
>> 2019-09-13 10:35:53: pid 30659: WARNING: watchdog failed to ping
>> host"192.192.192.192"
>> 2019-09-13 10:35:53: pid 30659: DETAIL: ping process exits with code: 1
>> 2019-09-13 10:35:53: pid 30659: WARNING: watchdog lifecheck, failed to
>> connect to any trusted servers
>> 2019-09-13 10:35:53: pid 30659: LOG: informing the node status change to
>> watchdog
>> 2019-09-13 10:35:53: pid 30659: DETAIL: node id :0 status = "NODE DEAD"
>> message:"trusted server is unreachable"
>> 2019-09-13 10:35:53: pid 30656: LOG: new IPC connection received
>> 2019-09-13 10:35:53: pid 30656: LOG: received node status change ipc
>> message
>> 2019-09-13 10:35:53: pid 30656: DETAIL: trusted server is unreachable
>> 2019-09-13 10:35:53: pid 30656: WARNING: watchdog lifecheck reported, we
>> are disconnected from the network
>> 2019-09-13 10:35:53: pid 30656: DETAIL: changing the state to LOST
>> 2019-09-13 10:35:53: pid 30656: LOG: watchdog node state changed from
>> [MASTER] to [LOST]
>> 2019-09-13 10:35:53: pid 30656: FATAL: system has lost the network
>> 2019-09-13 10:35:53: pid 30656: LOG: Watchdog is shutting down
>> 2019-09-13 10:35:53: pid 30813: LOG: watchdog: de-escalation started
>> 2019-09-13 10:35:53: pid 30646: LOG: watchdog child process with pid:
>> 30656 exits with status 768
>> 2019-09-13 10:35:53: pid 30646: FATAL: watchdog child process exit with
>> fatal error. exiting pgpool-II
>> 2019-09-13 10:35:53: pid 30814: LOG: setting the local watchdog node name
>> to "localhost:50000 Linux tishii-CFSV7-1"
>> 2019-09-13 10:35:53: pid 30814: LOG: watchdog cluster is configured with
>> 2 remote nodes
>> 2019-09-13 10:35:53: pid 30814: LOG: watchdog remote node:0 on
>> localhost:50006
>> 2019-09-13 10:35:53: pid 30814: LOG: watchdog remote node:1 on
>> localhost:50010
>> 2019-09-13 10:35:53: pid 30814: LOG: interface monitoring is disabled in
>> watchdog
>> 2019-09-13 10:35:53: pid 30814: LOG: watchdog node state changed from
>> [DEAD] to [LOADING]
>> 2019-09-13 10:35:53: pid 30814: LOG: new outbound connection to
>> localhost:50006
>> 2019-09-13 10:35:53: pid 30814: LOG: new outbound connection to
>> localhost:50010
>> 2019-09-13 10:35:53: pid 30814: LOG: watchdog node state changed from
>> [LOADING] to [INITIALIZING]
>> 2019-09-13 10:35:53: pid 30814: LOG: new watchdog node connection is
>> received from "127.0.0.1:60611"
>> 2019-09-13 10:35:53: pid 30814: LOG: new node joined the cluster
>> hostname:"localhost" port:50006 pgpool_port:50004
>> 2019-09-13 10:35:53: pid 30814: LOG: new watchdog node connection is
>> received from "127.0.0.1:61123"
>> 2019-09-13 10:35:53: pid 30814: LOG: new node joined the cluster
>> hostname:"localhost" port:50010 pgpool_port:50008
>> 2019-09-13 10:35:53: pid 30814: LOG: Watchdog is shutting down
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
>>
More information about the pgpool-hackers
mailing list