[pgpool-general: 8730] Re: Clients disconnection when slave node is off

Tatsuo Ishii ishii at sraoss.co.jp
Tue Apr 11 15:49:40 JST 2023


Ok, I think I have found the cause of the issue. Unfortunately I am
not able to find any solution for this at the moment. Let me explain.

First of all, I was able to reproduce the issue by using pgbench and
pgpool_setup (a test tool coming with pgpool to create pgbench +
PostgreSQL cluster).

$ pgpool_setup
$ echo "load_balance_mode = off" >> etc/pgpool.conf
$ echo "failover_on_backend_error = off" >> etc/pgpool.conf
$ ./startall
$ pgbench -p 11000 -n -c 1 -T 300 test

At this point I hoped a standby node shutdown does not affect pgbench,
because load balance mode is off, pgpool should only access to primary
node.

So I entered:
$ pg_ctl -D data1 stop

Then pgbench aborted shortly:

pgbench: error: client 0 aborted in command 5 (SQL) of script 0; perhaps the backend died while processing
[snip]

After debugging pgpool using gdb, I got stack trace.

#6  0x000056538ef05fc4 in child_exit (code=code at entry=1) at protocol/child.c:1336
#7  0x000056538ef2ba6a in pool_virtual_main_db_node_id () at context/pool_query_context.c:341
#8  0x000056538ef0e42b in read_packets_and_process (frontend=frontend at entry=0x5653900cb4c8, backend=backend at entry=0x7f732f21fcd8, reset_request=reset_request at entry=0, 
    state=state at entry=0x7ffcef518b6c, num_fields=num_fields at entry=0x7ffcef518b6a, cont=cont at entry=0x7ffcef518b74 "\001V") at protocol/pool_process_query.c:5098
#9  0x000056538ef0f1e3 in pool_process_query (frontend=0x5653900cb4c8, backend=0x7f732f21fcd8, reset_request=reset_request at entry=0)
    at protocol/pool_process_query.c:298
#10 0x000056538ef0734a in do_child (fds=fds at entry=0x565390107ba0) at protocol/child.c:455
#11 0x000056538eedb906 in fork_a_child (fds=0x565390107ba0, id=26) at main/pgpool_main.c:842
#12 0x000056538eee3e2a in PgpoolMain (discard_status=<optimized out>, clear_memcache_oidmaps=<optimized out>) at main/pgpool_main.c:541
#13 0x000056538eed9826 in main (argc=<optimized out>, argv=<optimized out>) at main/main.c:365

As the child process exits and the socket connection closed, and
pgbench complains it. Why pgpool exited? Pgpool maintains information
including, which node is primary, which node is up or
down. pool_virtual_main_db_node_id() returns such that
information. Unfortunately the information is changing while
failover. If pgpool process uses unstable information, it would cause
bad results including segfaults (that actually happend in the
past. See mantis bug track no. 481 and 482). Thus if
pool_virtual_main_db_node_id() finds that failover is ongoing, it
exits the process and session is disconnected. One solution for this
is, locking the information by pgpool process. However, this is not
only hard to implement, but could result in dead lock. So I hesitate
to employ it.

BTW, the reson why the issue happens in pgbench, and does not happen
in psql is, I think pgbench produces extremely busy transactions which
increases the chance to reproduce the problem.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Hi,
> 
> Thanks. I will looking into this.
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
> 
>> Hi,
>> 
>> Please find attached logs.
>> 
>> Thank you.
>> 
>> Best,
>> Jesús
>> 
>> El lun, 3 abr 2023 11:06, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
>> 
>>> Hi,
>>>
>>> Can you share pgpool log?
>>>
>>> > Hi Tatsuo,
>>> >
>>> > Should I test anything else to try to solve my problem?
>>> >
>>> > Thank you.
>>> >
>>> > Best,
>>> > Jesús
>>> >
>>> > El vie, 31 mar 2023 12:10, Jesús Campoy <jesuscampoy at gmail.com>
>>> escribió:
>>> >
>>> >> Hi,
>>> >>
>>> >> I just performed the same tests with auto_failback to off but the
>>> >> behaviour is the same (clients are disconnected).
>>> >>
>>> >> Thank you Tatsuo.
>>> >>
>>> >> Best,
>>> >> Jesús
>>> >>
>>> >> El vie, 31 mar 2023 2:31, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
>>> >>
>>> >>> Hi Jesús,
>>> >>>
>>> >>> Can you try again after setting "auto_failback = off"? I suspect
>>> >>> auto_failback confuses pgpool.
>>> >>>
>>> >>> Best reagards,
>>> >>> --
>>> >>> Tatsuo Ishii
>>> >>> SRA OSS LLC
>>> >>> English: http://www.sraoss.co.jp/index_en/
>>> >>> Japanese:http://www.sraoss.co.jp
>>> >>>
>>> >>> > Hi Tatsuo,
>>> >>> >
>>> >>> > When I connect to pgpool with psql this session is not disconnected.
>>> >>> > However, I've performed a test with pgbench inserting data with 30
>>> >>> clients
>>> >>> > in the database and when I shutdown server2 some clients of pgbench
>>> are
>>> >>> > disconnected.
>>> >>> > Please find attached a zip file with pgpool logs, pgbench log and
>>> >>> > configurations of pgpool and postgres.
>>> >>> >
>>> >>> > Thank you for your assistance in this matter.
>>> >>> >
>>> >>> > Best,
>>> >>> > Jesús
>>> >>> >
>>> >>> > El mié, 22 mar 2023 a las 2:05, Tatsuo Ishii (<ishii at sraoss.co.jp>)
>>> >>> > escribió:
>>> >>> >
>>> >>> >> > Please find attached my pgpool config and a log file when the
>>> standby
>>> >>> >> > (server2) is powered off.
>>> >>> >> > Thank you for your help.
>>> >>> >>
>>> >>> >> I assume server2 = host B.
>>> >>> >>
>>> >>> >> I have looked into the log file but failed to find log lines related
>>> >>> >> to user sessions which were diconnected. I was looking for such log
>>> >>> >> lines because you said:
>>> >>> >>
>>> >>> >> > Sometimes I have to power off the host B and then, the clients
>>> >>> connected
>>> >>> >>
>>> >>> >> If such an event occurs, there should be such log lines.
>>> >>> >> Many log lines like:
>>> >>> >>
>>> >>> >> 2023-03-20 10:59:27.725: [unknown] pid 31499: LOG:  failover or
>>> >>> failback
>>> >>> >> event detected
>>> >>> >> 2023-03-20 10:59:27.725: [unknown] pid 31499: DETAIL:  restarting
>>> >>> myself
>>> >>> >> 2023-03-20 10:59:27.726: main pid 30237: LOG:  child process with
>>> pid:
>>> >>> >> 31499 exits with status 256
>>> >>> >> 2023-03-20 10:59:27.727: main pid 30237: LOG:  fork a new child
>>> process
>>> >>> >> with pid: 29017
>>> >>> >>
>>> >>> >> just show that process 31499 is not related to any client session
>>> >>> >> ([unknown] indicates this) and even if the process exited, any
>>> client
>>> >>> >> will not be affected.
>>> >>> >>
>>> >>> >> Can you connect to pgpool using psql and shutdown server2 so that
>>> log
>>> >>> >> lines I am expecting are recorded?
>>> >>> >>
>>> >>> >> Best reagards,
>>> >>> >> --
>>> >>> >> Tatsuo Ishii
>>> >>> >> SRA OSS LLC
>>> >>> >> English: http://www.sraoss.co.jp/index_en/
>>> >>> >> Japanese:http://www.sraoss.co.jp
>>> >>> >>
>>> >>> >>
>>> >>> >> > Best,
>>> >>> >> > Jesús
>>> >>> >> >
>>> >>> >> > El vie, 17 mar 2023 a las 0:13, Tatsuo Ishii (<ishii at sraoss.co.jp
>>> >)
>>> >>> >> > escribió:
>>> >>> >> >
>>> >>> >> >> > Ok, I will send you the log ASAP.
>>> >>> >> >> > I forget to indicate that we are running two instances of
>>> pgpool
>>> >>> using
>>> >>> >> >> > watchdog and VIP.
>>> >>> >> >> >
>>> >>> >> >> > I mean, in host A is running pgpool (active) and primary
>>> >>> database. In
>>> >>> >> >> host
>>> >>> >> >> > B is running the other instance of pgpool and the standby
>>> >>> database.
>>> >>> >> >> > Sometimes I have to power off the host B and then, the clients
>>> >>> >> connected
>>> >>> >> >> to
>>> >>> >> >> > pgpool in VIP are disconnected.
>>> >>> >> >> >
>>> >>> >> >> > I have the same pgpool.conf for both pgpool instances. Do you
>>> >>> need It?
>>> >>> >> >>
>>> >>> >> >> No, one pgpool.conf is enough.
>>> >>> >> >>
>>> >>> >> >> > Thanks for your help!
>>> >>> >> >>
>>> >>> >> >> You are welcome.
>>> >>> >> >> --
>>> >>> >> >> Tatsuo Ishii
>>> >>> >> >> SRA OSS LLC
>>> >>> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >>> >> >> Japanese:http://www.sraoss.co.jp
>>> >>> >> >>
>>> >>> >>
>>> >>>
>>> >>
>>>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general



More information about the pgpool-general mailing list