[pgpool-general: 8772] Re: [pgpool 4.0.4 ] PCP + psql commands get stuck after systemd-networkd restart

Tue May 16 15:50:05 JST 2023

Hi,

Sorry for the late reply.

> > Could you try to disable wd_monitoring_interfaces_list and see if this
> issue is improved?
> No. Actually I added these interfaces to try if it actually improves the
> situation :)
> In the absence of this setting, the situation is still the same!
> 
> What I notice is that to make the 'stuck/frozen' commands (watchdog + psql
> show pool_nodes) respond/work again,
> I need to reboot the entire VM.
> Is there a way to make the commands respond without a restart?
> So that the nodes will be in a 'down' state atleast from the watchdog
> perspective (instead of SHUTDOWN)?

Sorry, currently we don't have a workaround to avoid this situation.

Actually, pgpool should be shutdown and pcp command should return an error.
I don't know why the pgpool process remained and it got stuck/frozen, you may have encountered a bug.

> My end goal is to achieve auto-remediation through pgpool 4.1 as your post
> describes here:
> https://b-peng.blogspot.com/2022/02/auto-failback.html#:~:text=To%20use%20this%20automatic%20failback,9.1%20or%20later%20is%20required

No. 
Automatic failback feature is used for re-attaching a healthy PostgreSQL standby node,
not Pgpool-II node.

On Tue, 2 May 2023 10:55:57 +0530
Gopikrishnan <nksgopikrishnan at gmail.com> wrote:

> Hi Bo,
> Thank you for your response!
> 
> > Could you try to disable wd_monitoring_interfaces_list and see if this
> issue is improved?
> No. Actually I added these interfaces to try if it actually improves the
> situation :)
> In the absence of this setting, the situation is still the same!
> 
> What I notice is that to make the 'stuck/frozen' commands (watchdog + psql
> show pool_nodes) respond/work again,
> I need to reboot the entire VM.
> Is there a way to make the commands respond without a restart?
> So that the nodes will be in a 'down' state atleast from the watchdog
> perspective (instead of SHUTDOWN)?
> 
> My end goal is to achieve auto-remediation through pgpool 4.1 as your post
> describes here:
> https://b-peng.blogspot.com/2022/02/auto-failback.html#:~:text=To%20use%20this%20automatic%20failback,9.1%20or%20later%20is%20required
> .
> 
> I am also considering a cronjob or similar to effect the 're-attaching'
> node back into the pool.
> 
> *Thanks*
> *Gopi*
> 
> 
> On Mon, May 1, 2023 at 1:45 PM Bo Peng <pengbo at sraoss.co.jp> wrote:
> 
> > Hello,
> >
> > It is the feature Pgpool will immediately detect the failure and shutdown
> > when network failure occurs.
> > This is a solution to avoid split brain.
> >
> > And wd_interval and wd_heartbeat_deadtime don't affect this behavior.
> >
> > Could you try to disable wd_monitoring_interfaces_list and see if this
> > issue is improved?
> >
> >   wd_monitoring_interfaces_list = ''
> >
> > On Mon, 1 May 2023 10:36:15 +0530
> > Gopikrishnan <nksgopikrishnan at gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Any pointers would be helpful.
> > > Currently, the pgpool is highly unstable due to momentary network
> > > disruptions.
> > >
> > > While it looks like pgpool was already built with that in mind,
> > > It is not working correctly for me due to a bug or misconfiguration.
> > >
> > > *Thanks*
> > > *Gopi*
> > >
> > >
> > > On Wed, Apr 19, 2023 at 2:08 PM Gopikrishnan <nksgopikrishnan at gmail.com>
> > > wrote:
> > >
> > > > Thank you for your response.
> > > >
> > > > Answers to your questions:
> > > > 1. I am using pgpool 4.0.4
> > > > 2. DEBUG was specifically enabled to debug this issue. (PCP commands
> > > > frozen)
> > > > 3. Yes, all the mentioned properties are enabled. (All the pgpool
> > > > configurations are below for the reference)
> > > >
> > > > >If the network goes down, watchdog will detect the network failure and
> > > > shutdown itself.
> > > > >To avoid such problems, it is recommended to shutdown pgpool before
> > > > restarting network.
> > > >
> > > > 1. The network was not down; It got restarted. i.e. it came back up in
> > no
> > > > time (Within seconds)
> > > > In my current understanding, the watchdog settings: wd_interval and
> > wd_life_point
> > > > should have covered/tolerated this network downtime?
> > > > 2. Most of the time in my prod environment, the restart or a glitch in
> > > > network is not in the application control, to pre-emptively stop
> > pgPool.
> > > >
> > > > My follow-up questions:
> > > > -------
> > > > 1. Am I hitting a bug in pgPool?
> > > > 2. Is this scenario (Network glitch) handled better in a newer PgPool
> > > > version? (So that I can upgrade, if possible with minimal changes to
> > the
> > > > confs.)
> > > > -------------------------
> > > > allow_clear_text_frontend_auth = off
> > > > allow_multiple_failover_requests_from_node = off
> > > > allow_sql_comments = off
> > > > app_name_redirect_preference_list = ''
> > > > arping_cmd = 'arping -U $_IP_$ -w 1'
> > > > arping_path = '/sbin'
> > > > authentication_timeout = 60
> > > > backend_data_directory0 = '/db/data'
> > > > backend_data_directory1 = '/db/data'
> > > > backend_data_directory2 = '/db/data'
> > > > backend_flag0 = 'ALLOW_TO_FAILOVER'
> > > > backend_flag1 = 'ALLOW_TO_FAILOVER'
> > > > backend_flag2 = 'ALLOW_TO_FAILOVER'
> > > > backend_hostname0 = '10.108.104.31'
> > > > backend_hostname1 = '10.108.104.32'
> > > > backend_hostname2 = '10.108.104.33'
> > > > backend_port0 = 5432
> > > > backend_port1 = 5432
> > > > backend_port2 = 5432
> > > > backend_weight0 = 1
> > > > backend_weight1 = 1
> > > > backend_weight2 = 1
> > > > black_function_list = 'currval,lastval,nextval,setval'
> > > > black_memqcache_table_list = ''
> > > > black_query_pattern_list = ''
> > > > check_temp_table = on
> > > > check_unlogged_table = on
> > > > child_life_time = 300
> > > > child_max_connections = 0
> > > > clear_memqcache_on_escalation = on
> > > > client_idle_limit = 0
> > > > client_idle_limit_in_recovery = 0
> > > > connect_timeout = 10000
> > > > connection_cache = on
> > > > connection_life_time = 0
> > > > database_redirect_preference_list = ''
> > > > delay_threshold = 10000000
> > > > delegate_IP = ''
> > > > detach_false_primary = off
> > > > disable_load_balance_on_write = 'transaction'
> > > > enable_pool_hba = off
> > > > failback_command = ''
> > > > failover_command = '/usr/local/etc/failover.sh %d %h %p %D %m %H %M %P
> > %r
> > > > %R'
> > > > failover_if_affected_tuples_mismatch = off
> > > > failover_on_backend_error = on
> > > > failover_require_consensus = on
> > > > failover_when_quorum_exists = on
> > > > follow_master_command = '/usr/local/etc/follow_master.sh %d %h %p %D
> > %m %M
> > > > %H %P %r %R'
> > > > health_check_database = ''
> > > > health_check_max_retries = 3
> > > > health_check_password = 'e2f2da4a027a41bf8517406dd9ca970e'
> > > > health_check_period = 5
> > > > health_check_retry_delay = 1
> > > > health_check_timeout = 30
> > > > health_check_user = 'pgpool'
> > > > heartbeat_destination0 = '10.108.104.32'
> > > > heartbeat_destination1 = '10.108.104.33'
> > > > heartbeat_destination_port0 = 9694
> > > > heartbeat_destination_port1 = 9694
> > > > heartbeat_device0 = ''
> > > > heartbeat_device1 = ''
> > > > if_down_cmd = ''
> > > > if_up_cmd = ''
> > > > ifconfig_path = '/sbin'
> > > > ignore_leading_white_space = on
> > > > insert_lock = off
> > > > listen_addresses = '*'
> > > > listen_backlog_multiplier = 2
> > > > load_balance_mode = on
> > > > lobj_lock_table = ''
> > > > log_client_messages = off
> > > > log_connections = off
> > > > log_destination = 'syslog'
> > > > log_hostname = off
> > > > log_line_prefix = '%t: pid %p: '
> > > > log_per_node_statement = off
> > > > log_standby_delay = 'if_over_threshold'
> > > > log_statement = off
> > > > logdir = '/tmp'
> > > > master_slave_mode = on
> > > > master_slave_sub_mode = 'stream'
> > > > max_pool = 4
> > > > memory_cache_enabled = off
> > > > memqcache_auto_cache_invalidation = on
> > > > memqcache_cache_block_size = 1048576
> > > > memqcache_expire = 0
> > > > memqcache_max_num_cache = 1000000
> > > > memqcache_maxcache = 409600
> > > > memqcache_memcached_host = 'localhost'
> > > > memqcache_memcached_port = 11211
> > > > memqcache_method = 'shmem'
> > > > memqcache_oiddir = '/var/log/pgpool/oiddir'
> > > > memqcache_total_size = 67108864
> > > > num_init_children = 32
> > > > other_pgpool_hostname0 = '10.108.104.32'
> > > > other_pgpool_hostname1 = '10.108.104.33'
> > > > other_pgpool_port0 = 9999
> > > > other_pgpool_port1 = 9999
> > > > other_wd_port0 = 9000
> > > > other_wd_port1 = 9000
> > > > pcp_listen_addresses = '*'
> > > > pcp_port = 9898
> > > > pcp_socket_dir = '/tmp'
> > > > pid_file_name = '/var/run/pgpool/pgpool.pid'
> > > > ping_path = '/bin'
> > > > pool_passwd = 'pool_passwd'
> > > > port = 9999
> > > > recovery_1st_stage_command = 'recovery_1st_stage'
> > > > recovery_2nd_stage_command = ''
> > > > recovery_password = 'ZPH3Xnuh8ISKMZjSqLvIBQe_WTOzXbPF'
> > > > recovery_timeout = 90
> > > > recovery_user = 'postgres'
> > > > relcache_expire = 0
> > > > relcache_size = 256
> > > > replicate_select = off
> > > > replication_mode = off
> > > > replication_stop_on_mismatch = off
> > > > reset_query_list = 'ABORT; DISCARD ALL'
> > > > search_primary_node_timeout = 300
> > > > serialize_accept = off
> > > > socket_dir = '/var/run/pgpool/socket'
> > > > sr_check_database = 'postgres'
> > > > sr_check_password = 'e2f2da4a027a41bf8517406dd9ca970e'
> > > > sr_check_period = 10
> > > > sr_check_user = 'pgpool'
> > > > ssl = off
> > > > ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
> > > > ssl_prefer_server_ciphers = off
> > > > syslog_facility = 'LOCAL1'
> > > > syslog_ident = 'pgpool'
> > > > trusted_servers = ''
> > > > use_watchdog = on
> > > > wd_authkey = ''
> > > > wd_de_escalation_command = '/usr/local/etc/desc.sh'
> > > > wd_escalation_command = '/usr/local/etc/esc.sh'
> > > > wd_heartbeat_deadtime = 30
> > > > wd_heartbeat_keepalive = 2
> > > > wd_heartbeat_port = 9694
> > > > wd_hostname = '10.108.104.31'
> > > > wd_interval = 10
> > > > wd_ipc_socket_dir = '/tmp'
> > > > wd_life_point = 3
> > > > wd_lifecheck_dbname = 'template1'
> > > > wd_lifecheck_method = 'heartbeat'
> > > > wd_lifecheck_password = ''
> > > > wd_lifecheck_query = 'SELECT 1'
> > > > wd_lifecheck_user = 'nobody'
> > > > wd_monitoring_interfaces_list = 'any'
> > > > wd_port = 9000
> > > > wd_priority = 1
> > > > white_function_list = ''
> > > > white_memqcache_table_list = ''
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *Thanks*
> > > > *Gopi*
> > > >
> > > >
> > > > On Wed, Apr 19, 2023 at 1:13 PM Bo Peng <pengbo at sraoss.co.jp> wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> > When I do:
> > > >> > systemctl restart systemd-networkd
> > > >> >
> > > >> > After that, I am not able to execute any PCP commands like:
> > > >> > pcp_watchdog_info
> > > >> > It is frozen.
> > > >>
> > > >> If the network goes down, watchdog will detect the network failure and
> > > >> shutdown itself.
> > > >> To avoid such problems, it is recommended to shutdown pgpool before
> > > >> restarting network.
> > > >>
> > > >> BTW, which version of Pgpool-II are you using?
> > > >>
> > > >> > I tried restarting pgpool and postgres to no avail.
> > > >> > However, rebooting the system gets it back to a workable state. (PCP
> > > >> > commands are running again and I can attach the nodes back to the
> > pool)
> > > >> >
> > > >> > The pgPool logs shows that the pg-pool was shutdown due to the
> > network
> > > >> > event:
> > > >> > -------------------------------
> > > >> >
> > > >> > 2023-04-17T15:27:25.190949+00:00 vmvrlcm-104-32 g[3042]: [268-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  network event received
> > > >> > 2023-04-17T15:27:25.191041+00:00 vmvrlcm-104-32 g[3042]: [268-2]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DETAIL:  deleted = YES Link change event = NO
> > > >> > 2023-04-17T15:27:25.191186+00:00 vmvrlcm-104-32 g[3042]: [269-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  STATE MACHINE INVOKED WITH EVENT =
> > NETWORK
> > > >> IP
> > > >> > IS REMOVED Current State = STANDBY
> > > >> > 2023-04-17T15:27:25.191243+00:00 vmvrlcm-104-32 g[3042]: [270-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  network interface lo having flags 65609
> > > >> > 2023-04-17T15:27:25.191296+00:00 vmvrlcm-104-32 g[3042]: [271-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  network interface eth0 having flags
> > 69699
> > > >> > 2023-04-17T15:27:25.191352+00:00 vmvrlcm-104-32 g[3042]: [272-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  network interface "eth0" link is active
> > > >> > 2023-04-17T15:27:25.191401+00:00 vmvrlcm-104-32 g[3042]: [273-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  network interface "eth0" link is up
> > > >> > 2023-04-17T15:27:25.191449+00:00 vmvrlcm-104-32 g[3042]: [274-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  network interface lo having flags 65609
> > > >> > 2023-04-17T15:27:25.191497+00:00 vmvrlcm-104-32 g[3042]: [275-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  network interface "eth0" is up and we
> > can
> > > >> > continue
> > > >> > 2023-04-17T15:27:25.191551+00:00 vmvrlcm-104-32 g[3042]: [276-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: WARNING:  network IP is removed and system has
> > no
> > > >> IP is
> > > >> > assigned
> > > >> > 2023-04-17T15:27:25.191614+00:00 vmvrlcm-104-32 g[3042]: [276-2]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DETAIL:  changing the state to in network
> > trouble
> > > >> > 2023-04-17T15:27:25.191667+00:00 vmvrlcm-104-32 g[3042]: [277-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: LOG:  watchdog node state changed from
> > [STANDBY] to
> > > >> [IN
> > > >> > NETWORK TROUBLE]
> > > >> > 2023-04-17T15:27:25.191713+00:00 vmvrlcm-104-32 g[3042]: [278-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE
> > > >> > CHANGED Current State = IN NETWORK TROUBLE
> > > >> > 2023-04-17T15:27:25.191759+00:00 vmvrlcm-104-32 g[3042]: [279-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: FATAL:  system has lost the network
> > > >> > 2023-04-17T15:27:25.191807+00:00 vmvrlcm-104-32 g[3042]: [280-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: LOG:  Watchdog is shutting down
> > > >> > 2023-04-17T15:27:25.191849+00:00 vmvrlcm-104-32 g[3042]: [281-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  sending packet, watchdog node:[
> > > >> > vmvrlcm-104-31.eng.vmware.com:9999 Linux
> > vmvrlcm-104-31.eng.vmware.com]
> > > >> > command id:[10] type:[INFORM I AM GOING DOWN] state:[IN NETWORK
> > TROUBLE]
> > > >> > 2023-04-17T15:27:25.191894+00:00 vmvrlcm-104-32 g[3042]: [282-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  sending watchdog packet to socket:8,
> > > >> type:[X],
> > > >> > command ID:10, data Length:0
> > > >> > 2023-04-17T15:27:25.191952+00:00 vmvrlcm-104-32 g[3042]: [283-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  sending packet, watchdog node:[
> > > >> > vmvrlcm-104-33.eng.vmware.com:9999 Linux
> > vmvrlcm-104-33.eng.vmware.com]
> > > >> > command id:[10] type:[INFORM I AM GOING DOWN] state:[IN NETWORK
> > TROUBLE]
> > > >> > 2023-04-17T15:27:25.192001+00:00 vmvrlcm-104-32 g[3042]: [284-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3042: DEBUG:  sending watchdog packet to socket:9,
> > > >> type:[X],
> > > >> > command ID:10, data Length:0
> > > >> > 2023-04-17T15:27:25.192671+00:00 vmvrlcm-104-32 pgpool[3040]: [24-1]
> > > >> > 2023-04-17 15:27:25: pid 3040: DEBUG:  reaper handler
> > > >> > 2023-04-17T15:27:25.192753+00:00 vmvrlcm-104-32 pgpool[3040]: [25-1]
> > > >> > 2023-04-17 15:27:25: pid 3040: DEBUG:  watchdog child process with
> > pid:
> > > >> > 3042 exit with FATAL ERROR. pgpool-II will be shutdown
> > > >> > 2023-04-17T15:27:25.192803+00:00 vmvrlcm-104-32 pgpool[3040]: [26-1]
> > > >> > 2023-04-17 15:27:25: pid 3040: LOG:  watchdog child process with
> > pid:
> > > >> 3042
> > > >> > exits with status 768
> > > >> > 2023-04-17T15:27:25.192864+00:00 vmvrlcm-104-32 pgpool[3040]: [27-1]
> > > >> > 2023-04-17 15:27:25: pid 3040: FATAL:  watchdog child process exit
> > with
> > > >> > fatal error. exiting pgpool-II
> > > >> > 2023-04-17T15:27:25.197530+00:00 vmvrlcm-104-32 ck[3157]: [23-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3157: DEBUG:  lifecheck child receives shutdown
> > request
> > > >> > signal 2, forwarding to all children
> > > >> > 2023-04-17T15:27:25.197611+00:00 vmvrlcm-104-32 ck[3157]: [24-1]
> > > >> 2023-04-17
> > > >> > 15:27:25: pid 3157: DEBUG:  lifecheck child receives fast shutdown
> > > >> request
> > > >> > 2023-04-17T15:27:25.197658+00:00 vmvrlcm-104-32 at sender[3159]:
> > [148-1]
> > > >> > 2023-04-17 15:27:25: pid 3159: DEBUG:  watchdog heartbeat sender
> > child
> > > >> > receives shutdown request signal 2
> > > >> > 2023-04-17T15:27:25.197994+00:00 vmvrlcm-104-32 at sender[3163]:
> > [148-1]
> > > >> > 2023-04-17 15:27:25: pid 3163: DEBUG:  watchdog heartbeat sender
> > child
> > > >> > receives shutdown request signal 2
> > > >> > 2023-04-17T15:27:25.199168+00:00 vmvrlcm-104-32 at receiver[3161]:
> > > >> [18-1]
> > > >> > 2023-04-17 15:27:25: pid 3161: DEBUG:  watchdog heartbeat receiver
> > child
> > > >> > receives shutdown request signal 2
> > > >> > 2023-04-17T15:27:25.199567+00:00 vmvrlcm-104-32 at receiver[3158]:
> > > >> [18-1]
> > > >> > 2023-04-17 15:27:25: pid 3158: DEBUG:  watchdog heartbeat receiver
> > child
> > > >> > receives shutdown request signal 2
> > > >> > 2023-04-17T15:27:25.448554+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [386-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check:
> > clearing
> > > >> alarm
> > > >> > 2023-04-17T15:27:25.448689+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [387-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  SSL is requested but
> > SSL
> > > >> > support is not available
> > > >> > 2023-04-17T15:27:25.450621+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [388-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate kind = 5
> > > >> > 2023-04-17T15:27:25.451892+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [389-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate
> > backend: key
> > > >> > data received
> > > >> > 2023-04-17T15:27:25.451987+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [390-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  authenticate backend:
> > > >> > transaction state: I
> > > >> > 2023-04-17T15:27:25.452043+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [391-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check:
> > clearing
> > > >> alarm
> > > >> > 2023-04-17T15:27:25.452096+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [392-1] 2023-04-17 15:27:25: pid 3197: DEBUG:  health check:
> > clearing
> > > >> alarm
> > > >> > 2023-04-17T15:27:25.455020+00:00 vmvrlcm-104-32 check
> > process(0)[3196]:
> > > >> > [386-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check:
> > clearing
> > > >> alarm
> > > >> > 2023-04-17T15:27:25.455096+00:00 vmvrlcm-104-32 check
> > process(0)[3196]:
> > > >> > [387-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  SSL is requested but
> > SSL
> > > >> > support is not available
> > > >> > 2023-04-17T15:27:25.457196+00:00 vmvrlcm-104-32 check
> > process(0)[3196]:
> > > >> > [388-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate kind = 5
> > > >> > 2023-04-17T15:27:25.458437+00:00 vmvrlcm-104-32 check
> > process(0)[3196]:
> > > >> > [389-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate
> > backend: key
> > > >> > data received
> > > >> > 2023-04-17T15:27:25.458556+00:00 vmvrlcm-104-32 check
> > process(0)[3196]:
> > > >> > [390-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  authenticate backend:
> > > >> > transaction state: I
> > > >> > 2023-04-17T15:27:25.458674+00:00 vmvrlcm-104-32 check
> > process(0)[3196]:
> > > >> > [391-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check:
> > clearing
> > > >> alarm
> > > >> > 2023-04-17T15:27:25.458742+00:00 vmvrlcm-104-32 check
> > process(0)[3196]:
> > > >> > [392-1] 2023-04-17 15:27:25: pid 3196: DEBUG:  health check:
> > clearing
> > > >> alarm
> > > >> > 2023-04-17T15:27:30.452427+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [393-1] 2023-04-17 15:27:30: pid 3197: DEBUG:  health check:
> > clearing
> > > >> alarm
> > > >> > 2023-04-17T15:27:30.454041+00:00 vmvrlcm-104-32 check
> > process(2)[3197]:
> > > >> > [394-1] 2023-04-17 15:27:30: pid 3197: DEBUG:  SSL is requested but
> > SSL
> > > >> > support is not available
> > > >> >
> > > >> > ------------------
> > > >> >
> > > >> > After this, it is in kind of a loop of the 'clearing alarm' + 'SSL
> > > >> support
> > > >> > is not available'
> > > >> >
> > > >> > The relevant (In my current understanding) watchdog settings are:
> > > >> >
> > > >>
> > ----------------------------------------------------------------------------------------
> > > >> > wd_hostname = '10.108.104.31'
> > > >> > wd_lifecheck_method = 'heartbeat'
> > > >> > wd_interval = 10
> > > >> > wd_heartbeat_keepalive = 2
> > > >> > wd_heartbeat_deadtime = 30
> > > >> > heartbeat_destination0 = '10.108.104.32'
> > > >> > heartbeat_device0 = ''
> > > >> > heartbeat_destination1 = '10.108.104.33'
> > > >> > heartbeat_device1 = ''
> > > >> > wd_monitoring_interfaces_list = 'any'
> > > >>
> > > >> Above logs are DEBUG messages and I don't think they caused this
> > issue.
> > > >> Do these DEBUG messages only appear when you restart the network?
> > > >>
> > > >> If you are using watchdog, you also need to configure the following
> > > >> parameters:
> > > >>
> > > >> heartbeat_destination_port0
> > > >> heartbeat_destination_port1
> > > >> other_pgpool_hostname0
> > > >> other_pgpool_port0
> > > >> other_pgpool_hostname1
> > > >> other_pgpool_port1
> > > >>
> > > >> Regards,
> > > >>
> > > >> --
> > > >> Bo Peng <pengbo at sraoss.co.jp>
> > > >> SRA OSS LLC
> > > >> https://www.sraoss.co.jp/
> > > >>
> > > >
> >
> >
> > --
> > Bo Peng <pengbo at sraoss.co.jp>
> > SRA OSS LLC
> > https://www.sraoss.co.jp/
> >

-- 
Bo Peng <pengbo at sraoss.co.jp>
SRA OSS LLC
https://www.sraoss.co.jp/