[pgpool-general: 8526] Re: "delay_threshold_by_time" not detecting replication lag
zam bak
zam6ak at gmail.com
Wed Dec 28 00:39:38 JST 2022
On Mon, Dec 26, 2022 at 9:23 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>
> > Hi
> >
> > I just upgraded to 4.4.1 and am noticing that "delay_threshold_by_time"
> > does not seem to be logging or even showing replication lag.
> >
> > I use pgbench with scale_factor=1000 to cause replication delay during the
> > initialization phase.
> > During this phase (which lasts about a minute) I am able to
> > SELECT * FROM pg_stat_replication
> > and see replication lag being 15-20 seconds for duration of initialization
> >
> > However, pgpool log does not show and thresholds reached in the log
> > (despite having log_standby_delay = 'always' ) and "pcp_node_info" and
> > "show pool_nodes" show no lag either
> >
> > if I switch back to "delay_threshold" param only, everything is working as
> > expected and I can see log entries such as:
> >
> > sr_check_worker CONTEXT: while checking replication time lag
> > sr_check_worker LOG: Replication of node: 1 is behind 820903040 bytes from
> > the primary server (node: 0)
> >
> > But none of this works when I put "delay_threshold_by_time=1"......
> >
> > 1. Is this a valid bug or could it be my settings?
> > 2. If in fact this is a bug, is it just affecting logging/displaying or the
> > lag and the switching of SELECT queries to primary would still occur if I
> > use this setting and the lag threshold is reached?
>
> delay_threshold_by_time just uses the query result of "int4" column of the following query:
>
> SELECT application_name, state, sync_state,(EXTRACT(EPOCH FROM replay_lag)*1000000)::integer FROM pg_stat_replication;
> application_name | state | sync_state | int4
> ------------------+-----------+------------+---------
> server1 | streaming | async | 1578050
>
> (the result above was taken in my local environment).
>
> If you manually issue the query above against the PostgreSQL primary
> node while executing pgbench -i, what do you see as a result?
I wrote a "capture" script that runs several commands/queries to
demonstrate this issue
Script uses the query you requested (except I added the entire
replay_lag columns also)
The script followed by the output is below and it shows:
* when replay lag was 56+ seconds (according to pg_stat_statements),
the output from SHOW POOL_NODES and from pcp_node_info showed no delay
* there was no entry in the log showing replication log threshold being exceeded
SCRIPT:
[postgres at localhost]$ cat check_lag.sh
#!/usr/bin/env bash
echo -e "\n-"
echo "$(date --iso-8601=seconds) | checking lag on primary directly
via pg_stat_replication..."
psql -w -U postgres -h 10.11.32.15 -p 5432 -c "SELECT
application_name, state, sync_state,(EXTRACT(EPOCH FROM
replay_lag)*1000000)::integer, replay_lag FROM pg_stat_replication;"
echo -e "\n"
echo "$(date --iso-8601=seconds) | checking lag via SHOW POOL_NODES"
psql -U postgres -h localhost -p 9999 -c "SHOW POOL_NODES"
echo -e "\n"
echo "$(date --iso-8601=seconds) | checking lag via pcp_node_info
(against replica)"
pcp_node_info 1 -w -v -h localhost -p 9898 -U postgres
echo -e "\n"
echo "$(date --iso-8601=seconds) | current contents of
/var/log/pgpool-II/pgpool-01-T
ue.log"
tail -n 20 /var/log/pgpool-II/pgpool-01-Tue.log
echo -e "\n\n\n\n\n"
OUTPUT:
[postgres at localhost]$ ./check_lag.sh
2022-12-27T15:18:55+00:00 | checking lag on primary directly via
pg_stat_replication...
application_name | state | sync_state | int4 | replay_lag
------------------+-----------+------------+----------+-----------------
walreceiver | streaming | async | 56663201 | 00:00:56.663201
(1 row)
2022-12-27T15:18:55+00:00 | checking lag via SHOW POOL_NODES
node_id | hostname | port | status | pg_status | lb_weight | role
| pg_role | select_cnt | load_balance_node | replication_delay |
replication_state | replication_sync_state | last_status_change
---------+------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 10.11.32.15 | 5432 | up | up | 0.500000 |
primary | primary | 0 | true | 0
| | | 2022-12-27 14:44:16
1 | 10.11.96.15 | 5432 | up | up | 0.500000 |
standby | standby | 0 | false | 0.000000 second
| | | 2022-12-27 14:44:16
(2 rows)
2022-12-27T15:18:55+00:00 | checking lag via pcp_node_info (against replica)
Hostname : 10.11.96.15
Port : 5432
Status : 2
Weight : 0.500000
Status Name : up
Backend Status Name : up
Role : standby
Backend Role : standby
Replication Delay : 0.000000 second
Replication State : none
Replication Sync State : none
Last Status Change : 2022-12-27 14:44:16
2022-12-27T15:18:55+00:00 | current contents of
/var/log/pgpool-II/pgpool-01-Tue.log
2022-12-27 15:18:45.517 [117552]: [111-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: PCP process with pid: 118550 exit with
SUCCESS.
2022-12-27 15:18:45.517 [117552]: [112-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: PCP process with pid: 118550 exits with
status 0
2022-12-27 15:18:48.578 [117347]: [1-1] user=[No Connection],db=[No
Connection],app=child LOG: new connection received
2022-12-27 15:18:48.578 [117347]: [2-1] user=[No Connection],db=[No
Connection],app=child DETAIL: connecting host=::1 port=33866
2022-12-27 15:18:48.631 [117347]: [3-1] user=[No Connection],db=[No
Connection],app=psql LOG: frontend disconnection: session time:
0:00:00.053 user=postgres database=postgres host=::1 port=33866
2022-12-27 15:18:48.636 [117552]: [113-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: forked new pcp worker, pid=118562
socket=7
2022-12-27 15:18:48.651 [117552]: [114-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: PCP process with pid: 118562 exit with
SUCCESS.
2022-12-27 15:18:48.651 [117552]: [115-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: PCP process with pid: 118562 exits with
status 0
2022-12-27 15:18:52.711 [117348]: [1-1] user=[No Connection],db=[No
Connection],app=child LOG: new connection received
2022-12-27 15:18:52.711 [117348]: [2-1] user=[No Connection],db=[No
Connection],app=child DETAIL: connecting host=::1 port=33876
2022-12-27 15:18:52.764 [117348]: [3-1] user=[No Connection],db=[No
Connection],app=psql LOG: frontend disconnection: session time:
0:00:00.053 user=postgres database=postgres host=::1 port=33876
2022-12-27 15:18:52.770 [117552]: [116-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: forked new pcp worker, pid=118572
socket=7
2022-12-27 15:18:52.784 [117552]: [117-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: PCP process with pid: 118572 exit with
SUCCESS.
2022-12-27 15:18:52.784 [117552]: [118-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: PCP process with pid: 118572 exits with
status 0
2022-12-27 15:18:55.927 [117350]: [1-1] user=[No Connection],db=[No
Connection],app=child LOG: new connection received
2022-12-27 15:18:55.927 [117350]: [2-1] user=[No Connection],db=[No
Connection],app=child DETAIL: connecting host=::1 port=33884
2022-12-27 15:18:55.979 [117350]: [3-1] user=[No Connection],db=[No
Connection],app=psql LOG: frontend disconnection: session time:
0:00:00.052 user=postgres database=postgres host=::1 port=33884
2022-12-27 15:18:55.984 [117552]: [119-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: forked new pcp worker, pid=118582
socket=7
2022-12-27 15:18:55.998 [117552]: [120-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: PCP process with pid: 118582 exit with
SUCCESS.
2022-12-27 15:18:55.999 [117552]: [121-1] user=[No Connection],db=[No
Connection],app=pcp_main LOG: PCP process with pid: 118582 exits with
status 0
2022-12-27 15:19:01.452 [117351]: [1-1] user=[No Connection],db=[No
Connection],app=child LOG: new connection received
2022-12-27 15:19:01.452 [117351]: [2-1] user=[No Connection],db=[No
Connection],app=child DETAIL: connecting host=::1 port=36606
2022-12-27 15:19:01.505 [117351]: [3-1] user=[No Connection],db=[No
Connection],app=psql LOG: frontend disconnection: session time:
0:00:00.053 user=postgres database=postgres host=::1 port=36606
>
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
More information about the pgpool-general
mailing list