[pgpool-general: 8762] Re: Clients disconnection when slave node is off
Tatsuo Ishii
ishii at sraoss.co.jp
Sun May 14 11:06:38 JST 2023
Hi Jesús,
> Hi Tatsuo. Thank you very much.
> Should we add It as a bug in Pgpool-II Bug Tracker?
No, you don't need to as it's a known limitation that pgpool does not
accept new connection while failover.
> El sáb, 13 may 2023 9:24, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
>
>> Ok. The errors were generated while clients tried to connect to
>> pgpool. My patch covers the case when failover happens while
>> connections from clients to pgpool are *kept*. However the patch does
>> not cover the case when clients try to establish connection to pgpool
>> while failover.
>>
>> I tested my patch using pgbench. If pgbench is given "-C" (create
>> connection for each transaction), I get same errors you mentioned.
>>
>> I have to admit my patch does not cover all the cases. I need more
>> time to deal with these problems.
>>
>> > Hi,
>> >
>> > I think we cannot connect to pgpool. I will show you the output of my
>> > dbCheck script.
>> >
>> >
>> > - *Pgpool without patch and backend1 as slave**:*
>> >
>> > [root at pg_client1 services]# ./dbcheck.sh $VIP_PGPOOL
>> >
>> > psql: ERROR: do command failed
>> >
>> > DETAIL: backend error: "SFATAL"
>> >
>> > psql: ERROR: unable to read data from DB node 1
>> >
>> > DETAIL: socket read failed with error "Connection reset by peer"
>> >
>> > psql: server closed the connection unexpectedly
>> >
>> > This probably means the server terminated abnormally
>> >
>> > before or while processing the request.
>> >
>> > psql: server closed the connection unexpectedly
>> >
>> > This probably means the server terminated abnormally
>> >
>> > before or while processing the request.
>> >
>> > This probably means the server terminated abnormally
>> >
>> > before or while processing the request.
>> >
>> >
>> >
>> > - *Pgpool with patch and backend1 as slave:*
>> >
>> > psql: ERROR: unable to read message kind
>> >
>> > DETAIL: kind does not match between main(52)
>> >
>> >
>> >
>> > - *Pgpool with patch and backend1 as master**:*
>> >
>> > psql: ERROR: unable to read data from DB node
>> >
>> > DETAIL: socket read failed with error "Connection reset by peer"
>> >
>> > server closed the connection unexpectedly
>> >
>> > This probably means the server terminated abnormally
>> >
>> > before or while processing the request.
>> >
>> > connection to server was lost
>> >
>> > server closed the connection unexpectedly
>> >
>> > This probably means the server terminated abnormally
>> >
>> > before or while processing the request.
>> >
>> > connection to server was lost
>> >
>> >
>> > Anyway, with a client which uses ODBC, if it tries to access the database
>> > during failover (from slave node) the following error is displayed:
>> "Driver
>> > Unable to Establish Connection with Data Source".
>> >
>> > El vie, 12 may 2023 a las 9:40, Tatsuo Ishii (<ishii at sraoss.co.jp>)
>> > escribió:
>> >
>> >> What do you mean by "database is not available"?
>> >>
>> >> 1. You can connect to pgpool but pgpool does not reply back.
>> >>
>> >> 2. You can cannect to pgpool but pgpool immediately disconnects.
>> >>
>> >> > Hi Tatsuo,
>> >> >
>> >> > I'm working with your patch but I continue facing a problem because
>> the
>> >> > database is not available during 1 second aprox (I have a script
>> calling
>> >> > select query every 0.1 seconds to check the time is not available the
>> >> > database).
>> >> >
>> >> > I will explain two different cases:
>> >> >
>> >> > 1. Slave node (backend1 in pgpool.conf) is turn off. With your patch
>> the
>> >> > database is always available. Without your patch the database is not
>> >> > available during 1 second.
>> >> > 2. Master node (backend0) is turn off. Failover is done to promote
>> >> > backend1. After that, I turn on again backend0, which is now slave
>> node.
>> >> If
>> >> > I turn off this slave node (backend0), the database is not available
>> >> during
>> >> > 1 second (with or without your patch)
>> >> >
>> >> > Do you have any idea why is this behaviour?
>> >> >
>> >> > Thanks in advance.
>> >> >
>> >> > Best,
>> >> > Jesús
>> >> >
>> >> > El vie, 14 abr 2023 3:41, Tatsuo Ishii <ishii at sraoss.co.jp> escribió:
>> >> >
>> >> >> Hi Jesús,
>> >> >>
>> >> >> > Hi Tatsuo,
>> >> >> >
>> >> >> > At first, thank you so much for your time to investigate this
>> issue.
>> >> >>
>> >> >> No problem.
>> >> >>
>> >> >> > I have compiled pgpool 4.3.2 with your patch and the problem with
>> >> pgbench
>> >> >> > is solved.
>> >> >> > I still need to test it in my environment.
>> >> >> >
>> >> >> > Anyway, I had a look your code and I have seen that the session is
>> >> closed
>> >> >> > only if failover is not completed in 30 seconds.
>> >> >> > I have the following doubt related to this change. Is this session
>> >> >> > operative during the failover? I mean, if failover spends 20
>> seconds,
>> >> is
>> >> >> > this session blocked during this time or this session can accept
>> any
>> >> >> > transaction?
>> >> >>
>> >> >> It is likely the session is blocked. The reason for "likely" is the
>> >> >> function which has the logic inside can be called frequently during
>> >> >> session but it is not always. It is possible that a pgpool process
>> >> >> already called the function by the time when failover starts, then
>> >> >> proceeds and sends a query to backend.
>> >> >>
>> >> >> > Let me another question. Should we add this issue as a bug?
>> >> >>
>> >> >> No you don't need. Developers already recognize this a bug report.
>> >> >>
>> >> >> > Thanks in advance.
>> >> >> >
>> >> >> > Best,
>> >> >> > Jesús
>> >> >> >
>> >> >> >
>> >> >> > El mié, 12 abr 2023 3:33, Tatsuo Ishii <ishii at sraoss.co.jp>
>> escribió:
>> >> >> >
>> >> >> >> > However a downside of this is, while failover clients cannot
>> >> process
>> >> >> >> > queries or at least slow down processing. Below is the log from
>> >> >> >> > pgbench using "-P 1" option to show progress. As you can see
>> from
>> >> 170
>> >> >> >> > s pgbench starts to slow down and recovers at 194 s. That is,
>> the
>> >> >> >> > slowdown continued for 24 seconds.
>> >> >> >> >
>> >> >> >>
>> >> >> >> After more research, I suspect the slow down is due to effect of
>> >> >> >> checkpointing. If I add "-S" option to change the transaction
>> time, I
>> >> >> >> don't see the slow down anymore.
>> >> >> >>
>> >> >> >> Best reagards,
>> >> >> >> --
>> >> >> >> Tatsuo Ishii
>> >> >> >> SRA OSS LLC
>> >> >> >> English: http://www.sraoss.co.jp/index_en/
>> >> >> >> Japanese:http://www.sraoss.co.jp
>> >> >> >>
>> >> >>
>> >>
>>
More information about the pgpool-general
mailing list