[pgpool-hackers: 3580] Inconsistency in native replication mode
Tatsuo Ishii
ishii at sraoss.co.jp
Mon Apr 20 13:49:16 JST 2020
Hi,
I have studied $subject issue. Suppose we have two sessions S1, S2 and
two servers N1 (master), N2 (slave). S1 updates table t1 in an
explicit transaction. S2 read t1. Below is a demonstration how we find
an inconsistency with native replication mode. "S1:N1 SQL statement"
denotes an SQL statement is executed on node N1 in session S1. Table
t1 has an integer column i. There is only on row in t1 and the initial
value is 0. Time is increasing from top to bottom.
S1/N1: BEGIN;
S1/N2: BEGIN;
S1/N1: UPDATE t1 SET i = i + 1; -- i = 1
S1/N2: UPDATE t1 SET i = i + 1; -- i = 1
S1/N1: COMMIT;
S2/N1: SELECT i FROM t1; -- i = 1
S2/N2: SELECT i FROM t1; -- i = 0
S1/N2: COMMIT;
So inconsistent values of i of t1 are observed in session S2.
To avoid the inconsistency, we can control the order of concurrent
COMMIT and the first SQL in a transaction, which acquires the
snapshot, so that they are never executed concurrently. (This is a
proposed algorithm in [1]). Note that this requires transactions run
in REPEATABLE READ (or SERIALIZABLE), not default READ COMMITTED.
Suppose we have a mutex lock M. To execute that first SQL in a
transaction, Pgpool-II needs to acquire M before it. Also it needs to
acquire M before issuing COMMIT. After hiring this we have:
S1/N1: BEGIN;
S1/N2: BEGIN;
S1/N1: UPDATE t1 SET i = i + 1; -- i = 1
S1/N2: UPDATE t1 SET i = i + 1; -- i = 1
S1: try to acquire M and succeeded
S1/N1: COMMIT;
S2: try to acquire M and failed
S1/N2: COMMIT;
S1: release M
S2: wake up
S2/N1: SELECT i FROM t1; -- i = 1
S2/N2: SELECT i FROM t1; -- i = 1
or:
S1/N1: BEGIN;
S1/N2: BEGIN;
S1/N1: UPDATE t1 SET i = i + 1; -- i = 1 in session S1
S1/N2: UPDATE t1 SET i = i + 1;
S1: try to acquire M and failed
S2: try to acquire M and succeeded
S2/N1: SELECT i FROM t1; -- i = 0 in session S2
S2/N2: SELECT i FROM t1; -- i = 0 in session S2
S2: release M
S1: wake up
S1/N1: COMMIT;
S2/N1: COMMIT;
As you see, we can now avoid the inconsistency.
Also [1] claims that only read only transaction can distribute SELECT
(load balance). Currently Pgpool-II load balance read only SELECT
whenever possible. i.e. if SELECT is followed by write query, start
transaction (if no transaction is started).
[1] http://www.vldb.org/pvldb/vol2/vldb09-694.pdf
Comments and suggestions are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
More information about the pgpool-hackers
mailing list