[pgpool-general: 8413] Re: Extended query and cache

Sat Sep 17 18:26:28 JST 2022

> Hi,
> 
> So I check that, after pgpool read the result from cache I found the next
> step to be was message type ‘1’ parse complete from backend and not sync
> message from frontend as I expected.
> 
> Notice that after sync message send to the backend the connection becomes
> idle. In case the sync message didn’t read from frontend to backend, then
> the connection becomes idle only after command complete.

Probably you misunderstand here. Command complete message never makes
PostgreSQL in idle state. PostgreSQL becomes idle (including idle in
transaction) state only after issuing Ready for Query message.

> As I mentioned earlier Sync message not send from frontend to backend in
> case pgpool read the result from cache.
> 
> Could you check that please ? Because what I saw it’s not what you
> described.

That's not what I saw here. Here's log emitted by pgproto (testing tool
provided by pgpool distribution).

t-ishii$ cat ../extended-query-cache.data
'P'	""	"SELECT 1"	0
'B'	""	""	0	0	0
'D'	'P'	""
'E'	""	0
'S'
'Y'
'X'

t-ishii$ pgproto -d test -p 11000 -f ../extended-query-cache.data
FE=> Parse(stmt="", query="SELECT 1")
FE=> Bind(stmt="", portal="")
FE=> Describe(portal="")
FE=> Execute(portal="")
FE=> Sync
<= BE ParseComplete
<= BE BindComplete
<= BE RowDescription
<= BE DataRow
<= BE CommandComplete(SELECT 1)
<= BE ReadyForQuery(I)
FE=> Terminate

"FE=> " means that a message sent from frontend to pgpool.
"<= BE" means that a message sent from pgpool to frontend.

Note that this output is exactly same as the case when query cache is
disabled or pgproto directly connects to PostgreSQL.

Also here's output from pg_stat_activity.

test=# select * from pg_stat_activity where application_name = 'pgproto';
-[ RECORD 1 ]----+------------------------------
datid            | 16392
datname          | test
pid              | 65254
leader_pid       | 
usesysid         | 10
usename          | t-ishii
application_name | pgproto
client_addr      | 127.0.0.1
client_hostname  | 
client_port      | 56758
backend_start    | 2022-09-17 18:16:30.854156+09
xact_start       | 
query_start      | 
state_change     | 2022-09-17 18:16:30.856935+09
wait_event_type  | Client
wait_event       | ClientRead
state            | idle
backend_xid      | 
backend_xmin     | 
query_id         | 
query            | 
backend_type     | client backend

> Thanks,
> 
> Avi.
> 
> 
> On Sat, 17 Sep 2022 at 4:05 Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> > The issue with the aws performance insights
>> > Is a result of the way pgpool read frontend packets when cache enabled
>> and
>> > disabled.
>> > If we querying pgpool in extended query mode and cache disabled I can see
>> > the following -
>> > 1. Read parse from frontend and send to backend.
>> > 2. Read bind message and send to the backend
>> > 3. Read describe message and send to the backend
>> > 4. Read execute message and send to the backend —>
>> >
>> > Until here the connection is in active state under pg_stats_activity
>> >
>> > 5. Read sync message and send to backend. —>
>> > Now the connection moved to idle state.
>> >
>> > When cache enabled and the query was found in cache :
>> >
>> > 1. Read parse from frontend and send to backend.
>> > 2. Read bind message and send to the backend
>> > 3. Read describe message and send to the backend
>> > 4. Read execute message and send to the backend —>
>> >
>> > Now pgpool found the query in cache so execute message not send to the
>> > backend.
>> > In that case instead of read sync from frontend and update the backend in
>> > order to move the connection state from active to idle . Pgpool start
>> read
>> > packet from backend which will be ‘1’, ‘2’ and so on untill
>> > commandComplete. This behaviour cause the connection to be in active
>> state
>> > until command complete.
>>
>> Really? pgpool does not try to read command complete from backend in
>> this case because the command complete message is part of the cache
>> data.
>>
>> After 4 (actually pgpool does not send the execute message to backend
>> because the cache is there)
>>
>> 1) pgpool read Sync message from frontend and forward to backend.
>>
>> 2) pgpool read from backend and returns '1' (command complete,
>>    response to parse message), '2' (bind complete, response to bind
>>    message), Row description (response to describe message) to
>>    frontend.
>>
>> 3) pgpool returns (without reading from backend) Command Complete and row
>> data (if any).
>>
>> 4) pgpool read ready for query message from backend and returns it to
>> frontend.
>>    at this point, pg_stat_activity should show 'idle' state.
>>
>> > I am trying to understand if it’s a bug or it should be like that.
>> >
>> > Thanks,
>> >
>> > Avi.
>> >
>> > On Thu, 15 Sep 2022 at 7:34 Avi Raboah <avi.raboah at gmail.com> wrote:
>> >
>> >> Thank you for the clarification.
>> >>
>> >> On Thu, 15 Sep 2022 at 3:47 Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >>
>> >>> > Hi,
>> >>> > I am not sure I understood what you mean when you mentioned “close
>> >>> message”
>> >>> >
>> >>> > But i’ll try to ask you in a different way.
>> >>> >
>> >>> > Let look on the following example:
>> >>> >
>> >>> > 1. Select * from users where id = $1; using extended query. —> miss
>> >>> >
>> >>> > 2. Now in case i’ll run the same query i’ll get it back from cache.
>> >>> >
>> >>> > So I am understanding why the parse message should be pass to the
>> >>> backend
>> >>> > and not read the result from cache due to that Parse message didn’t
>> >>> > contains the query params. That’s make sense.
>> >>> >
>> >>> > But in Bind message the packet already contains the query params. So
>> why
>> >>> > you can’t read the result from cache here and we need to wait to the
>> >>> > execute message?
>> >>>
>> >>> Bind message does not return the "result". It just returns it succeeded
>> >>> (bind complete message) or failed (error response). The actual result
>> >>> (which is called "portal") is only in PostgreSQL's memory. So "caching
>> >>> result of bind message" is almost meaningless.
>> >>>
>> >>> > One more question is why we need to send the bind message anyway to
>> the
>> >>> > backend in case we have already the result of the execute message in
>> >>> cache?
>> >>>
>> >>> Think about this scenario:
>> >>>
>> >>> 1. pgpool receives a bind message but finds that the corresponding
>> >>> query cache exists. So pgpool does not send the bind message to
>> >>> backend.
>> >>>
>> >>> 2. In other session the table used in the query has been modified and
>> >>> the query is gone.
>> >>>
>> >>> 3. pgpool receives an execute message and tries to extract the query
>> >>> cache, but it has already gone. However pgpool cannot send the execute
>> >>> message to backend because the result of bind message (portal) does
>> >>> not exist in the backend.
>> >>>
>> >>> > I am asking the above questions because I am looking in the rds
>> >>> performance
>> >>> > insights.
>> >>> > And I can see that in case I am sending a lot of queries concurrently
>> >>> when
>> >>> > I have cache enabled I can see a very big wait because of a lot of
>> >>> > connections in idle ClientRead wait event.
>> >>>
>> >>> I am not familiar with performance insights and cannot comment on it.
>> >>>
>> >>> > And I am asking myself why the db needs to know about queries we
>> already
>> >>> > have in cache?
>> >>> >
>> >>> > Hope it was clear.
>> >>> >
>> >>> >
>> >>> > Thanks a lot,
>> >>> >
>> >>> > Avi
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Wed, 14 Sep 2022 at 2:42 Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>> >>> >
>> >>> >> > Hi,
>> >>> >> >
>> >>> >> > In case query already cached, why in extended query mode the db
>> knows
>> >>> >> about
>> >>> >> > that query?
>> >>> >> > Because in that case pgpool should return the result from the
>> cache
>> >>> in
>> >>> >> > order to save db resources.
>> >>> >> > But I found that although we have the query in cache pgpool still
>> >>> send
>> >>> >> > parse and bind request to the db.
>> >>> >> >
>> >>> >> > Please share your thoughts 🙏
>> >>> >>
>> >>> >> Yes, Pgpool-II only caches the result of execute message. The main
>> >>> >> reason is, to not return stale cache.  After receiving a close
>> message
>> >>> >> for the statement or the portal that is bound to the execute
>> message,
>> >>> >> the cache for the execute message should not be returned. When
>> >>> >> Pgpool-II receives close messages, they remove the internal record
>> of
>> >>> >> previpusly received statement or portal. When an execute message
>> >>> >> arrives, Pgpool-II checks whether the record for the execute message
>> >>> >> exists. If does not, the request fails. This strategy is simple but
>> >>> >> works well.
>> >>> >>
>> >>> >> I think the saving by caching parse message is not small for
>> >>> >> especially complex queries. But in this case users already reuse the
>> >>> >> prepared statement anyway.
>> >>> >>
>> >>> >> I think the saving of bind message is usually small because it does
>> >>> >> not involve planning in most cases.
>> >>> >>
>> >>> >> Best reagards,
>> >>> >> --
>> >>> >> Tatsuo Ishii
>> >>> >> SRA OSS LLC
>> >>> >> English: http://www.sraoss.co.jp/index_en/
>> >>> >> Japanese:http://www.sraoss.co.jp
>> >>> >>
>> >>>
>> >>
>>