[pgpool-hackers: 3510] Quarantine state in native replication mode is dangerous
Tatsuo Ishii
ishii at sraoss.co.jp
Thu Feb 13 08:15:34 JST 2020
Usama,
I think quarantine state in native replication mode could cause data
inconsistency. Below is a step to reproduce the problem.
# create 3 pgpool + 2 PostgreSQL cluster with replication mode. Note
# that Pgpool-II is compiled with HEALTHCHECK_DEBUG=1 and
# WATCHDOG_DEBUG=1 (make HEALTHCHECK_DEBUG=1 WATCHDOG_DEBUG=1)
$ watchdog_setup -wn 3 -n 2 -m r
# start the cluster
./startall
./pcp_watchdog_info -v -p 50001
Watchdog Cluster Information
Total Nodes : 3
Remote Nodes : 2
Quorum state : QUORUM EXIST
Alive Remote Nodes : 2
VIP up on local node : YES
Master Node Name : localhost:50000 Linux tishii-CFSV7-1
Master Host Name : localhost
Watchdog Node Information
Node Name : localhost:50000 Linux tishii-CFSV7-1
Host Name : localhost
Delegate IP : Not_Set
Pgpool port : 50000
Watchdog port : 50002
Node priority : 3
Status : 4
Status Name : MASTER
Node Name : localhost:50004 Linux tishii-CFSV7-1
Host Name : localhost
Delegate IP : Not_Set
Pgpool port : 50004
Watchdog port : 50006
Node priority : 2
Status : 7
Status Name : STANDBY
Node Name : localhost:50008 Linux tishii-CFSV7-1
Host Name : localhost
Delegate IP : Not_Set
Pgpool port : 50008
Watchdog port : 50010
Node priority : 1
Status : 7
Status Name : STANDBY
$ psql -p 50000 -c "show pool_nodes" test
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+--------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | /tmp | 51000 | up | 0.500000 | master | 0 | true | 0 | | | 2020-02-13 07:58:54
1 | /tmp | 51001 | up | 0.500000 | slave | 0 | false | 0 | | | 2020-02-13 07:58:54
(2 rows)
# create artificial failure on pgpool0/PostgreSQL node 1
echo "1 down" > pgpool0/log/backend_down_request
# make sure that pgpool0/node 1 goes into quarantine state
$ psql -p 50000 -c "show pool_nodes" test
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+------------+-----------+--------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | /tmp | 51000 | up | 0.500000 | master | 0 | true | 0 | | | 2020-02-13 08:01:37
1 | /tmp | 51001 | quarantine | 0.500000 | slave | 0 | false | 0 | | | 2020-02-13 08:01:40
(2 rows)
# modify database via pgpool0
$ psql -p 50000 test
psql (12.0)
Type "help" for help.
test=# create table t1(i int);
CREATE TABLE
test=# insert into t1 values(1);
INSERT 0 1
test=# \q
# check the database consistenct via pgpool1
$ psql -p 51001 test
psql (12.0)
Type "help" for help.
test=# select * from t1;
i
---
1
(1 row)
test=# \q
t-ishii$ psql -p 51001 test
psql (12.0)
Type "help" for help.
test=# select * from t1;
ERROR: relation "t1" does not exist
LINE 1: select * from t1;
^
Now node 0 and node 1 are in inconsistent state.
Probably we should not allow setting failover_require_consensus to on
in native replication mode, or at least add a strong warning to not do
so in the doc. What do you think?
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
More information about the pgpool-hackers
mailing list