[pgpool-hackers: 4334] Possible dead lock with shared relation cache.
Tatsuo Ishii
ishii at sraoss.co.jp
Fri May 12 20:47:47 JST 2023
We have found possible dead lock with shared relation cache and fixed
it in 4.3, 4.2 and 4.1 stable releases.
[From the commit message]
When a user defined function obtains table locking, call to the
function using extended query protocol could lead to dead lock. Here's
a scenario.
(1) In session A client sends parse, bind, execute request for the
function to pgpol.
(2) Pgpool in session A forwards the request to PostgreSQL.
(3) PostgreSQL in session A performs execute, resulting in a table lock.
(4) In session B client sends parse, bind, execute request for the
function to pgpool.
(5) Pgpool in session B forwards the request to PostgreSQL.
(6) PostgreSQL in session B performs bind but the table was already
locked by PostgreSQL in session A, and it waits for release of the
lock.
(7) After pgpool in session B forwards the execute to PostgreSQL, it
acquires semaphore to search shared relation cache to check the
volatility of the function. Then it calls do_query and sends flush
message to PostgreSQL to obtain the response from PostgreSQL up to
this point. But since PostgreSQL is waiting for table lock in #6,
pgpool has to wait for messages beyond bind complete.
(8) After pgpool in session A forwards the execute to PostgreSQL, it
tries to acquire semaphore to search shared relation cache to check
the volatility of the function but the semaphore was already
acquired by pgpool in session B, it waits for the release of the
semaphore.
(9) Session A and session B wait for each other, resulting in a dead
lock.
To fix this, modify pool_search_relcache() so that it releases
semaphore before calling do_query() (but acquires semaphore after
do_query()). By this, session A in #8 above can get semaphore, and go
forward. It receives sync message from client and forwards it to
PostgreSQL. Upon receiving sync, the user defined function will finish
the execution and releases the table lock. This makes PostgreSQL in
session B move forward because now it can obtain the table lock.
Back patch to 4.3, 4.2 and 4.1. On these versions semaphore based
shared release cache is used.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
More information about the pgpool-hackers
mailing list