[pgpool-hackers: 1463] Intermediate report: bug#167

Wed Mar 16 15:12:43 JST 2016

This is an intermediate report for bug#167, willing to share
information on inspecting the issue.

http://www.pgpool.net/mantisbt/view.php?id=167

With pgpool-II 3.5 stable head still has a hang up problem when
certain extended queries are executed in streaming replication mode.
Here is a debug out from the Java application:

(1) 16:05:39.248 (1)  FE=> Parse(stmt=null,query="BEGIN",oids={})
16:05:39.248 (1)  FE=> Bind(stmt=null,portal=null)
16:05:39.248 (1)  FE=> Execute(portal=null,limit=0)
(2) 16:05:39.248 (1)  FE=> Parse(stmt=null,query="SELECT t.typname,t.oid FROM pg_catalog.pg_type t JOIN pg_catalog.pg_namespace n ON (t.typnamespace = n.oid)  WHERE n.nspname != 'pg_toast'",oids={})
16:05:39.248 (1)  FE=> Bind(stmt=null,portal=null)
16:05:39.249 (1)  FE=> Describe(portal=null)
16:05:39.249 (1)  FE=> Execute(portal=null,limit=0)
16:05:39.249 (1)  FE=> Sync
[hang]

Cause of the problem:

In (1), "sync map" for node 0, 1 are set to on. Sync map is an in
memory data structure which represents to which database node
pgpool-II send data. Pgpool-II later on uses the sync map to determine
from which node it should expect response.

In (2), pgpool-II calls do_query() to get meta table info. The
function sends query to node 0 followed by a flush message. Then
pgpool-II receives responses of the query from node 0 *and* the
response for the "BEGIN".

(3) Because there is pending data in the receive buffer of node 0,
pgpool-II calls ProcessBackendResponse() which in turn calls
read_kind_from_backend(). Since in the sync map data for both node 0,
1 are on, it hangs while trying to read data from node 1.  Note that
this only happens when the load balance node is 1. If the load balance
node is 0, this does not happen. This is the reason why the reporter
occasionally sees the issue, not always.

How to fix the problem:

- In (2), sends a flush message to node 1 as well. Node 1 will sends
  response for BEGIN and pgpool-II will not hang in (3). Problem is,
  this might cause performance problem since more message exchanging
  will be involved.

- Even if we implement above, pgpool-II will hang later on, trying to
  read data from node 1 since the SELECT was sent to only node 0.

It seems the only solution would be:

- Every time pgpool-II sends a message to backend, it remembers it in
  a FIFO queue along with data which node it sends message (similar to
  the sync map). When trying to receive message from backend,
  pgpool-II should consult the data to decide from which node it
  should read.

This is not a trivial work. If you have a simpler solution, please let
me know.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp