[pgpool-general: 8372] Re: pgpool failing over on bad queries
Jon SCHEWE
jon.schewe at raytheon.com
Sat Sep 3 05:49:52 JST 2022
>> I'm finding that I have a client sending a bunch of bad queries to pgpool. After a large number of these errors pgpool appears to decide that the primary backend is in a bad state and fails over to another backend. Why is pgpool doing this and how do I prevent it?
>
>Syntax error itself does not trigger failover. There must be other
>reason which triggers the failover. To find the cause, I need to
>inspect the pgpool log file.
>
What should I look for in the pgpool log file?
I have logs of log messages "new IPC connection received", those appear normal.
I also see "child process with pid ... exits with status 256" and "fork a new child process..."
I see the following on psql-03 when the primary node changed:
2022-08-31 01:11:57.819: watchdog pid 971756: LOG: watchdog received the failover command from remote pgpool-II node "psql-01:9898 Linux psql-01"
2022-08-31 01:11:57.819: watchdog pid 971756: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from psql-01:9898 Linux psql-01
2022-08-31 01:11:57.819: watchdog pid 971756: LOG: failover requires the majority vote, waiting for consensus
2022-08-31 01:11:57.819: watchdog pid 971756: DETAIL: failover request noted
2022-08-31 01:11:57.819: watchdog pid 971756: LOG: failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II node "psql-01:9898 Linux psql-01" is queued, waiting for the confirmation from other nodes
2022-08-31 01:11:57.819: watchdog pid 971756: LOG: signal_user1_to_parent_with_reason(4)
2022-08-31 01:11:57.819: main pid 971732: LOG: Pgpool-II parent process received SIGUSR1
2022-08-31 01:11:57.819: main pid 971732: LOG: Pgpool-II parent process received inform quarantine nodes signal from watchdog
2022-08-31 01:12:04.923: watchdog pid 971756: LOG: new IPC connection received
2022-08-31 01:12:07.919: watchdog pid 971756: LOG: watchdog received the failover command from remote pgpool-II node "psql-02:9898 Linux psql-02"
2022-08-31 01:12:07.919: watchdog pid 971756: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from psql-02:9898 Linux psql-02
2022-08-31 01:12:07.919: watchdog pid 971756: LOG: we have got the consensus to perform the failover
2022-08-31 01:12:07.919: watchdog pid 971756: DETAIL: 2 node(s) voted in the favor
2022-08-31 01:12:07.919: watchdog pid 971756: LOG: received degenerate backend request for node_id: 0 from pid [971756]
2022-08-31 01:12:07.919: watchdog pid 971756: LOG: signal_user1_to_parent_with_reason(0)
2022-08-31 01:12:07.919: main pid 971732: LOG: Pgpool-II parent process received SIGUSR1
2022-08-31 01:12:07.919: main pid 971732: LOG: Pgpool-II parent process has received failover request
2022-08-31 01:12:07.919: watchdog pid 971756: LOG: new IPC connection received
2022-08-31 01:12:07.919: watchdog pid 971756: LOG: received the failover indication from Pgpool-II on IPC interface
2022-08-31 01:12:07.919: watchdog pid 971756: LOG: watchdog is informed of failover start by the main process
2022-08-31 01:12:07.920: main pid 971732: LOG: starting degeneration. shutdown host psql-01(5432)
2022-08-31 01:12:07.922: main pid 971732: LOG: Restart all children
2022-08-31 01:12:07.923: main pid 971732: LOG: execute command: /etc/pgpool-II/failover.sh 0 psql-01 5432 /var/lib/pgsql/14/data 1 psql-02 0 0 5432 /var/lib/pgsql/14/data psql-01 5432
On psql-01 I see
2022-08-31 01:11:57.818: sr_check_worker pid 2111409: LOG: pool_ssl: "SSL_read": "no SSL error reported"
2022-08-31 01:11:57.818: sr_check_worker pid 2111409: LOG: received degenerate backend request for node_id: 0 from pid [2111409]
2022-08-31 01:11:57.818: watchdog pid 2080471: LOG: new IPC connection received
2022-08-31 01:11:57.818: watchdog pid 2080471: LOG: failover request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node "psql-03:9898 Linux psql-03"
2022-08-31 01:11:57.818: watchdog pid 2080471: DETAIL: waiting for the reply...
2022-08-31 01:11:57.819: watchdog pid 2080471: LOG: remote node "psql-03:9898 Linux psql-03" is asking to inform about quarantined backend nodes
2022-08-31 01:11:57.819: watchdog pid 2080471: LOG: signal_user1_to_parent_with_reason(4)
2022-08-31 01:11:57.819: sr_check_worker pid 2111409: LOG: degenerate backend request for node_id: 0 from pid [2111409], will be handled by watchdog, which is building consensus for request
2022-08-31 01:11:57.819: main pid 2080446: LOG: Pgpool-II parent process received SIGUSR1
2022-08-31 01:11:57.819: main pid 2080446: LOG: Pgpool-II parent process received inform quarantine nodes signal from watchdog
2022-08-31 01:12:07.884: sr_check_worker pid 2111409: ERROR: Failed to check replication time lag
2022-08-31 01:12:07.884: sr_check_worker pid 2111409: DETAIL: No persistent db connection for the node 0
2022-08-31 01:12:07.884: sr_check_worker pid 2111409: HINT: check sr_check_user and sr_check_password
2022-08-31 01:12:07.884: sr_check_worker pid 2111409: CONTEXT: while checking replication time lag
2022-08-31 01:12:07.884: watchdog pid 2080471: LOG: new IPC connection received
2022-08-31 01:12:09.642: watchdog pid 2080471: LOG: processing follow primary looking[acquire] request from remote node "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.733: watchdog pid 2080471: LOG: signal_user1_to_parent_with_reason(2)
2022-08-31 01:12:09.733: main pid 2080446: LOG: Pgpool-II parent process received SIGUSR1
2022-08-31 01:12:09.733: main pid 2080446: LOG: Pgpool-II parent process received sync backend signal from watchdog
2022-08-31 01:12:09.734: watchdog pid 2080471: LOG: new IPC connection received
2022-08-31 01:12:09.734: main pid 2080446: LOG: leader watchdog has performed failover
2022-08-31 01:12:09.734: main pid 2080446: DETAIL: syncing the backend states from the LEADER watchdog node
2022-08-31 01:12:09.734: watchdog pid 2080471: LOG: new IPC connection received
2022-08-31 01:12:09.734: watchdog pid 2080471: LOG: received the get data request from local pgpool-II on IPC interface
2022-08-31 01:12:09.734: watchdog pid 2080471: LOG: get data request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.735: watchdog pid 2080471: DETAIL: waiting for the reply...
2022-08-31 01:12:09.735: main pid 2080446: LOG: leader watchdog node "psql-03:9898 Linux psql-03" returned status for 3 backend nodes
2022-08-31 01:12:09.735: main pid 2080446: LOG: backend:0 is set to down status
2022-08-31 01:12:09.735: main pid 2080446: DETAIL: backend:0 is DOWN on cluster leader "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.735: main pid 2080446: LOG: backend:2 is set to down status
2022-08-31 01:12:09.735: main pid 2080446: DETAIL: backend:2 is DOWN on cluster leader "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.735: main pid 2080446: LOG: primary node:1 on leader watchdog node "psql-03:9898 Linux psql-03" is different from local primary node:0
2022-08-31 01:12:09.735: main pid 2080446: LOG: primary node was changed after the sync from "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.735: main pid 2080446: DETAIL: all children needs to be restarted
2022-08-31 01:12:09.857: sr_check_worker pid 2111409: LOG: worker process received restart request
On psql-02 I see
2022-08-31 01:11:57.819: watchdog pid 1187187: LOG: remote node "psql-03:9898 Linux psql-03" is asking to inform about quarantined backend nodes
2022-08-31 01:11:57.819: watchdog pid 1187187: LOG: signal_user1_to_parent_with_reason(4)
2022-08-31 01:11:57.819: main pid 1187162: LOG: Pgpool-II parent process received SIGUSR1
2022-08-31 01:11:57.819: main pid 1187162: LOG: Pgpool-II parent process received inform quarantine nodes signal from watchdog
2022-08-31 01:12:07.885: watchdog pid 1187187: LOG: new IPC connection received
2022-08-31 01:12:07.918: sr_check_worker pid 2277999: LOG: pool_ssl: "SSL_read": "no SSL error reported"
2022-08-31 01:12:07.918: sr_check_worker pid 2277999: LOG: received degenerate backend request for node_id: 0 from pid [2277999]
2022-08-31 01:12:07.918: watchdog pid 1187187: LOG: new IPC connection received
2022-08-31 01:12:07.918: watchdog pid 1187187: LOG: failover request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node "psql-03:9898 Linux psql-03"
2022-08-31 01:12:07.918: watchdog pid 1187187: DETAIL: waiting for the reply...
2022-08-31 01:12:07.919: sr_check_worker pid 2277999: LOG: degenerate backend request for 1 node(s) from pid [2277999], will be handled by watchdog
2022-08-31 01:12:09.642: watchdog pid 1187187: LOG: processing follow primary looking[acquire] request from remote node "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.733: watchdog pid 1187187: LOG: signal_user1_to_parent_with_reason(2)
2022-08-31 01:12:09.733: main pid 1187162: LOG: Pgpool-II parent process received SIGUSR1
2022-08-31 01:12:09.734: main pid 1187162: LOG: Pgpool-II parent process received sync backend signal from watchdog
2022-08-31 01:12:09.734: watchdog pid 1187187: LOG: new IPC connection received
2022-08-31 01:12:09.734: main pid 1187162: LOG: leader watchdog has performed failover
2022-08-31 01:12:09.734: main pid 1187162: DETAIL: syncing the backend states from the LEADER watchdog node
2022-08-31 01:12:09.734: watchdog pid 1187187: LOG: new IPC connection received
2022-08-31 01:12:09.734: watchdog pid 1187187: LOG: received the get data request from local pgpool-II on IPC interface
2022-08-31 01:12:09.734: watchdog pid 1187187: LOG: get data request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.734: watchdog pid 1187187: DETAIL: waiting for the reply...
2022-08-31 01:12:09.735: main pid 1187162: LOG: leader watchdog node "psql-03:9898 Linux psql-03" returned status for 3 backend nodes
2022-08-31 01:12:09.735: main pid 1187162: LOG: backend:0 is set to down status
2022-08-31 01:12:09.735: main pid 1187162: DETAIL: backend:0 is DOWN on cluster leader "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.735: main pid 1187162: LOG: backend:2 is set to down status
2022-08-31 01:12:09.735: main pid 1187162: DETAIL: backend:2 is DOWN on cluster leader "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.735: main pid 1187162: LOG: primary node:1 on leader watchdog node "psql-03:9898 Linux psql-03" is different from local primary node:0
2022-08-31 01:12:09.735: main pid 1187162: LOG: primary node was changed after the sync from "psql-03:9898 Linux psql-03"
2022-08-31 01:12:09.735: main pid 1187162: DETAIL: all children needs to be restarted
2022-08-31 01:12:09.872: sr_check_worker pid 2277999: ERROR: Failed to check replication time lag
2022-08-31 01:12:09.872: sr_check_worker pid 2277999: DETAIL: No persistent db connection for the node 0
2022-08-31 01:12:09.873: sr_check_worker pid 2277999: HINT: check sr_check_user and sr_check_password
2022-08-31 01:12:09.873: sr_check_worker pid 2277999: CONTEXT: while checking replication time lag
2022-08-31 01:12:09.873: sr_check_worker pid 2277999: LOG: worker process received restart request
More information about the pgpool-general
mailing list