[pgpool-hackers: 3786] pg_terminate_backend() does not work in native replication mode
Tatsuo Ishii
ishii at sraoss.co.jp
Thu Aug 20 13:31:47 JST 2020
Hi Usma,
While looking into the 073.pg_terminate_backend test failure I found
interesting issue.
Supoose we execute following SQL in native replication mode:
ssesion 1: select pg_sleep(60); /* at time 't1' */
session 2: select pg_terminate_backend('7615'); /* at time 't2' */
The pg_sleep() should be canceled at time t2, but actually it is
canceled at t2 + 60 seconds. Also after the cancel we get:
WARNING: packet kind of backend 1 ['D'] does not match with master/majority nodes packet kind ['E']
WARNING: write on backend 0 failed with error :"Success"
DETAIL: while trying to write data from offset: 0 wlen: 5
FATAL: failed to read kind from backend
DETAIL: kind mismatch among backends. Possible last query was: "select pg_sleep(60);" kind details are: 0[E: terminating connection due to administrator command] 1[D]
HINT: check data consistency among db nodes
What actually happening here is:
2020-08-20 13:01:46: psql pid 7603: LOG: DB node id: 0 backend pid: 7615 statement: BEGIN
2020-08-20 13:01:46: psql pid 7603: LOG: DB node id: 1 backend pid: 7616 statement: BEGIN
2020-08-20 13:01:46: psql pid 7603: LOG: DB node id: 0 backend pid: 7615 statement: select pg_sleep(60); <-- pgpool 7603 waiting for response from backend 0.
2020-08-20 13:02:06: psql pid 7598: LOG: DB node id: 0 backend pid: 7632 statement: SELECT version()
2020-08-20 13:02:06: psql pid 7598: LOG: DB node id: 0 backend pid: 7632 statement: SELECT count(*) FROM pg_catalog.pg_proc AS p, pg_catalog.pg_namespace AS n WHERE p.proname = 'pg_terminate_backend' AND n.oid = p.pronamespace AND n.nspname ~ '.*' AND p.provolatile = 'v'
2020-08-20 13:02:06: psql pid 7598: LOG: found the pg_terminate_backend request for backend pid:7615 on backend node:0
2020-08-20 13:02:06: psql pid 7598: DETAIL: setting the connection flag
2020-08-20 13:02:06: psql pid 7598: LOG: DB node id: 0 backend pid: 7632 statement: select pg_terminate_backend(7615);
2020-08-20 13:02:06: psql pid 7603: LOG: DB node id: 1 backend pid: 7616 statement: select pg_sleep(60); <--- pgpool 7603 got response because pg_terminate_backend executed. pgpool 7603 started to wait for response from backend 1.
2020-08-20 13:03:06: psql pid 7603: WARNING: packet kind of backend 1 ['D'] does not match with master/majority nodes packet kind ['E'] <-- after 60 seconds passed, pgpool 7603 got response from bacnend 0 and 1. <-- since backend 0 got error while backend 1 sucessfully executed pg_sleep(60), there were difference in packet kind.
2020-08-20 13:03:06: psql pid 7603: FATAL: failed to read kind from backend <-- and pgpool get angry!
2020-08-20 13:03:06: psql pid 7603: DETAIL: kind mismatch among backends. Possible last query was: "select pg_sleep(60);" kind details are: 0[E: terminating connection due to administrator command] 1[D]
2020-08-20 13:03:06: psql pid 7603: HINT: check data consistency among db nodes
2020-08-20 13:03:06: psql pid 7603: WARNING: write on backend 0 failed with error :"Success"
2020-08-20 13:03:06: psql pid 7603: DETAIL: while trying to write data from offset: 0 wlen: 5
2020-08-20 13:03:06: main pid 7572: LOG: child process with pid: 7603 exits with status 512
Any idea how to deal with this problem?
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
More information about the pgpool-hackers
mailing list