[pgpool-general: 7375] Re: Watchdog New Primary & Standby shutdown when Node 0 Fails
Tatsuo Ishii
ishii at sraoss.co.jp
Sat Dec 19 09:48:34 JST 2020
Ok, let me clarify. You think that 4.2.0's watchdog code has a problem here:
https://github.com/pgpool/pgpool2/blob/V4_2_0_RPM/src/watchdog/watchdog.c#L2244
and you think it causes your issue (node 0 dead).
So you wonder if commit 70e0b2b93715094823102c9b1879e83fa75c7913
would solve the issue.
I am not sure because according to the commit message, it was intended
to fix the problem of wd_cli command (which is new in 4.2). Probably
you'd better ask the commit author (Muhammad Usama). I have added his
email address in the Cc: field.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
> Hi Tatsuo,
>
> I am using the RPM version which I think according to github still has node 0 for example:
>
> https://github.com/pgpool/pgpool2/blob/V4_2_0_RPM/src/watchdog/watchdog.c#L2244
>
> Do you think this could cause the issue?
>
> If I configure with node 0 always being dead:
>
> hostname0= '192.168.40.71'
> wd_port0 = 9000
> pgpool_port0 = 9999
>
> hostname1 = '192.168.40.66'
> # Host name or IP address of pgpool node
> # for watchdog connection
> # (change requires restart)
> wd_port1 = 9000
> # Port number for watchdog service
> # (change requires restart)
> pgpool_port1 = 9999
> # Port number for pgpool
> # (change requires restart)
>
>
> hostname2 = '192.168.40.67'
> wd_port2 = 9000
> pgpool_port2 = 9999
>
> hostname3 = '192.168.40.64'
> wd_port3 = 9000
> pgpool_port3 = 9999
>
> hostname0= '192.168.40.71'
> wd_port0 = 9000
> pgpool_port0 = 9999
>
> hostname1 = '192.168.40.66'
> # Host name or IP address of pgpool node
> # for watchdog connection
> # (change requires restart)
> wd_port1 = 9000
> # Port number for watchdog service
> # (change requires restart)
> pgpool_port1 = 9999
> # Port number for pgpool
> # (change requires restart)
>
>
> hostname2 = '192.168.40.67'
> wd_port2 = 9000
> pgpool_port2 = 9999
>
> hostname3 = '192.168.40.64'
> wd_port3 = 9000
> pgpool_port3 = 9999
>
>
> need to set: enable_consensus_with_half_votes = on
>
> as 4 nodes with 1 always dead
>
>
> it works okay and as expected.
>
> Is there an RPM Testing version I could try of the commit from 4 days ago?
>
>
> Joe Madden
> Senior Systems Engineer
> D 01412224666
> joe.madden at mottmac.com
>
>
> -----Original Message-----
> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> Sent: 18 December 2020 13:19
> To: Joe Madden <Joe.Madden at mottmac.com>
> Cc: pgpool-general at pgpool.net
> Subject: Re: [pgpool-general: 7372] Re: Watchdog New Primary & Standby shutdown when Node 0 Fails
>
>> Does anyone know if this commit would cause the issue:
>>
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpgpool%2Fpgpool2%2Fcommit%2F70e0b2b93715094823102c9b1879e83fa75c7913&data=04%7C01%7CJoe.Madden%40mottmac.com%7C9f4d36ccbdb54a49a5f708d8a3577bae%7Ca2bed0c459574f73b0c2a811407590fb%7C0%7C0%7C637438943434767785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cUHVth1MTeFCeJcvdSaGeQyrqEs%2FHNKsv2BflU%2B5F1A%3D&reserved=0
>
> Asuming you are using 4.2.0 from the log file, this commit surely does
> not affect your issue because it was committed after 4.2.0 was out.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.jp%2Findex_en.php&data=04%7C01%7CJoe.Madden%40mottmac.com%7C9f4d36ccbdb54a49a5f708d8a3577bae%7Ca2bed0c459574f73b0c2a811407590fb%7C0%7C0%7C637438943434767785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fbbKXiccBv8wMdeA9wZ0MxWaYsTJu3TcNaPJGfGpE4U%3D&reserved=0
> Japanese:https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.jp%2F&data=04%7C01%7CJoe.Madden%40mottmac.com%7C9f4d36ccbdb54a49a5f708d8a3577bae%7Ca2bed0c459574f73b0c2a811407590fb%7C0%7C0%7C637438943434767785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6iqv00RyeLKQDW3FHAg3dsRXdOW75S23oT7uAnb0ufM%3D&reserved=0
>
>> Joe Madden
>> Senior Systems Engineer
>> D 01412224666
>> joe.madden at mottmac.com<mailto:joe.madden at mottmac.com>
>>
>> From: Joe Madden
>> Sent: 18 December 2020 09:45
>> To: pgpool-general at pgpool.net
>> Subject: RE: Watchdog New Primary & Standby shutdown when Node 0 Fails
>>
>> Hi All,
>>
>> I moved node 0 and Node 2 around (switch the node ids and updated the pgpool config) and I found the same issue on Node 2 (Now 0)
>>
>> It's got something to do with the Node id and the relevant configs, still don't know if it's a bug for not.
>>
>> Joe.
>>
>> Joe Madden
>> Senior Systems Engineer
>> D 01412224666
>> joe.madden at mottmac.com<mailto:joe.madden at mottmac.com>
>>
>> From: Joe Madden
>> Sent: 17 December 2020 18:57
>> To: pgpool-general at pgpool.net
>> Subject: Watchdog New Primary & Standby shutdown when Node 0 Fails
>>
>> Hi List,
>>
>> I've got a PGpool instance with three nodes:
>>
>> |Pg Pool Node 0 (192.168.40.66)| Pg Pool Node 1 (192.168.40.67)| Pg Pool Node 2 (192.168.40.64)|
>>
>> Communicate switch back end-
>> |Postgresql12 Primary | Postgresql 12 Secondary|
>>
>>
>> This works fine, Standby Nodes 1 & 2 can be shutdown, restarted etc without an issue. When node 0 is shutdown, one of the child processes fails and causes the Nodes 1 and Node 2 to shutdown after about 60 seconds post failover.
>>
>> I feel like this could be a bug, Our configurations on all three nodes are identical bar the weght pram which is different and node id of course.
>>
>> Config:
>>
>> # ----------------------------
>> # pgPool-II configuration file
>> # ----------------------------
>> #
>> # This file consists of lines of the form:
>> #
>> # name = value
>> #
>> # Whitespace may be used. Comments are introduced with "#" anywhere on a line.
>> # The complete list of parameter names and allowed values can be found in the
>> # pgPool-II documentation.
>> #
>> # This file is read on server startup and when the server receives a SIGHUP
>> # signal. If you edit the file on a running system, you have to SIGHUP the
>> # server for the changes to take effect, or use "pgpool reload". Some
>> # parameters, which are marked below, require a server shutdown and restart to
>> # take effect.
>> #
>>
>> #------------------------------------------------------------------------------
>> # BACKEND CLUSTERING MODE
>> # Choose one of: 'streaming_replication', 'native_replication',
>> # 'logical_replication', 'slony', 'raw' or 'snapshot_isolation'
>> # (change requires restart)
>> #------------------------------------------------------------------------------
>>
>> backend_clustering_mode = 'streaming_replication'
>>
>> #------------------------------------------------------------------------------
>> # CONNECTIONS
>> #------------------------------------------------------------------------------
>>
>> # - pgpool Connection Settings -
>>
>> listen_addresses = '*'
>> # Host name or IP address to listen on:
>> # '*' for all, '' for no TCP/IP connections
>> # (change requires restart)
>> port = 9999
>> # Port number
>> # (change requires restart)
>> socket_dir = '/tmp'
>> # Unix domain socket path
>> # The Debian package defaults to
>> # /var/run/postgresql
>> # (change requires restart)
>> reserved_connections = 0
>> # Number of reserved connections.
>> # Pgpool-II does not accept connections if over
>> # num_init_chidlren - reserved_connections.
>>
>>
>> # - pgpool Communication Manager Connection Settings -
>>
>> pcp_listen_addresses = '*'
>> # Host name or IP address for pcp process to listen on:
>> # '*' for all, '' for no TCP/IP connections
>> # (change requires restart)
>> pcp_port = 9898
>> # Port number for pcp
>> # (change requires restart)
>> pcp_socket_dir = '/tmp'
>> # Unix domain socket path for pcp
>> # The Debian package defaults to
>> # /var/run/postgresql
>> # (change requires restart)
>> listen_backlog_multiplier = 2
>> # Set the backlog parameter of listen(2) to
>> # num_init_children * listen_backlog_multiplier.
>> # (change requires restart)
>> serialize_accept = off
>> # whether to serialize accept() call to avoid thundering herd problem
>> # (change requires restart)
>>
>> # - Backend Connection Settings -
>>
>> backend_hostname0 = '192.168.40.61'
>> # Host name or IP address to connect to for backend 0
>> backend_port0 = 5432
>> # Port number for backend 0
>> backend_weight0 = 1
>> # Weight for backend 0 (only in load balancing mode)
>> backend_data_directory0 = '/var/lib/pgsql/12/data/'
>> # Data directory for backend 0
>> backend_flag0 = 'ALLOW_TO_FAILOVER'
>> # Controls various backend behavior
>> # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER
>> # or ALWAYS_PRIMARY
>> backend_application_name0 = '192.168.40.61'
>> # walsender's application_name, used for "show pool_nodes" command
>>
>> # - Backend Connection Settings -
>>
>> backend_hostname1 = '192.168.40.60'
>> # Host name or IP address to connect to for backend 0
>> backend_port1 = 5432
>> # Port number for backend 0
>> backend_weight1 = 1
>> # Weight for backend 0 (only in load balancing mode)
>> backend_data_directory1 = '/var/lib/pgsql/12/data/'
>> # Data directory for backend 0
>> backend_flag1 = 'ALLOW_TO_FAILOVER'
>> # Controls various backend behavior
>> # ALLOW_TO_FAILOVER, DISALLOW_TO_FAILOVER
>> # or ALWAYS_PRIMARY
>> backend_application_name1 = '192.168.40.60'
>> # walsender's application_name, used for "show pool_nodes" command
>>
>>
>> # - Authentication -
>>
>> enable_pool_hba = on
>> # Use pool_hba.conf for client authentication
>> pool_passwd = 'pool_passwd'
>> # File name of pool_passwd for md5 authentication.
>> # "" disables pool_passwd.
>> # (change requires restart)
>> authentication_timeout = 1min
>> # Delay in seconds to complete client authentication
>> # 0 means no timeout.
>>
>> allow_clear_text_frontend_auth = off
>> # Allow Pgpool-II to use clear text password authentication
>> # with clients, when pool_passwd does not
>> # contain the user password
>>
>> # - SSL Connections -
>>
>> ssl = off
>> # Enable SSL support
>> # (change requires restart)
>> #ssl_key = 'server.key'
>> # SSL private key file
>> # (change requires restart)
>> #ssl_cert = 'server.crt'
>> # SSL public certificate file
>> # (change requires restart)
>> #ssl_ca_cert = ''
>> # Single PEM format file containing
>> # CA root certificate(s)
>> # (change requires restart)
>> #ssl_ca_cert_dir = ''
>> # Directory containing CA root certificate(s)
>> # (change requires restart)
>> #ssl_crl_file = ''
>> # SSL certificate revocation list file
>> # (change requires restart)
>>
>> ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
>> # Allowed SSL ciphers
>> # (change requires restart)
>> ssl_prefer_server_ciphers = off
>> # Use server's SSL cipher preferences,
>> # rather than the client's
>> # (change requires restart)
>> ssl_ecdh_curve = 'prime256v1'
>> # Name of the curve to use in ECDH key exchange
>> ssl_dh_params_file = ''
>> # Name of the file containing Diffie-Hellman parameters used
>> # for so-called ephemeral DH family of SSL cipher.
>> #ssl_passphrase_command=''
>> # Sets an external command to be invoked when a passphrase
>> # for decrypting an SSL file needs to be obtained
>> # (change requires restart)
>>
>> #------------------------------------------------------------------------------
>> # POOLS
>> #------------------------------------------------------------------------------
>>
>> # - Concurrent session and pool size -
>>
>> num_init_children = 32
>> # Number of concurrent sessions allowed
>> # (change requires restart)
>> max_pool = 4
>> # Number of connection pool caches per connection
>> # (change requires restart)
>>
>> # - Life time -
>>
>> child_life_time = 5min
>> # Pool exits after being idle for this many seconds
>> child_max_connections = 0
>> # Pool exits after receiving that many connections
>> # 0 means no exit
>> connection_life_time = 0
>> # Connection to backend closes after being idle for this many seconds
>> # 0 means no close
>> client_idle_limit = 0
>> # Client is disconnected after being idle for that many seconds
>> # (even inside an explicit transactions!)
>> # 0 means no disconnection
>>
>>
>> #------------------------------------------------------------------------------
>> # LOGS
>> #------------------------------------------------------------------------------
>>
>> # - Where to log -
>>
>> log_destination = 'stderr'
>> # Where to log
>> # Valid values are combinations of stderr,
>> # and syslog. Default to stderr.
>>
>> # - What to log -
>>
>> log_line_prefix = '%t: pid %p: ' # printf-style string to output at beginning of each log line.
>>
>> log_connections = off
>> # Log connections
>> log_disconnections = off
>> # Log disconnections
>> log_hostname = off
>> # Hostname will be shown in ps status
>> # and in logs if connections are logged
>> log_statement = off
>> # Log all statements
>> log_per_node_statement = off
>> # Log all statements
>> # with node and backend informations
>> log_client_messages = off
>> # Log any client messages
>> log_standby_delay = 'if_over_threshold'
>> # Log standby delay
>> # Valid values are combinations of always,
>> # if_over_threshold, none
>>
>> # - Syslog specific -
>>
>> syslog_facility = 'LOCAL0'
>> # Syslog local facility. Default to LOCAL0
>> syslog_ident = 'pgpool'
>> # Syslog program identification string
>> # Default to 'pgpool'
>>
>> # - Debug -
>>
>> #log_error_verbosity = default # terse, default, or verbose messages
>>
>> #client_min_messages = notice # values in order of decreasing detail:
>> # debug5
>> # debug4
>> # debug3
>> # debug2
>> # debug1
>> # log
>> # notice
>> # warning
>> # error
>>
>> log_min_messages = debug5 # values in order of decreasing detail:
>> # debug5
>> # debug4
>> # debug3
>> # debug2
>> # debug1
>> # info
>> # notice
>> # warning
>> # error
>> # log
>> # fatal
>> # panic
>>
>> # This is used when logging to stderr:
>> #logging_collector = off # Enable capturing of stderr
>> # into log files.
>> # (change requires restart)
>>
>> # -- Only used if logging_collector is on ---
>>
>> #log_directory = '/tmp/pgpool_log' # directory where log files are written,
>> # can be absolute
>> #log_filename = 'pgpool-%Y-%m-%d_%H%M%S.log'
>> # log file name pattern,
>> # can include strftime() escapes
>>
>> #log_file_mode = 0600 # creation mode for log files,
>> # begin with 0 to use octal notation
>>
>> #log_truncate_on_rotation = off # If on, an existing log file with the
>> # same name as the new log file will be
>> # truncated rather than appended to.
>> # But such truncation only occurs on
>> # time-driven rotation, not on restarts
>> # or size-driven rotation. Default is
>> # off, meaning append to existing files
>> # in all cases.
>>
>> #log_rotation_age = 1d # Automatic rotation of logfiles will
>> # happen after that (minutes)time.
>> # 0 disables time based rotation.
>> #log_rotation_size = 10MB # Automatic rotation of logfiles will
>> # happen after that much (KB) log output.
>> # 0 disables size based rotation.
>> #------------------------------------------------------------------------------
>> # FILE LOCATIONS
>> #------------------------------------------------------------------------------
>>
>> pid_file_name = '/var/run/pgpool/pgpool.pid'
>> # PID file name
>> # Can be specified as relative to the"
>> # location of pgpool.conf file or
>> # as an absolute path
>> # (change requires restart)
>> logdir = '/tmp'
>> # Directory of pgPool status file
>> # (change requires restart)
>>
>>
>> #------------------------------------------------------------------------------
>> # CONNECTION POOLING
>> #------------------------------------------------------------------------------
>>
>> connection_cache = on
>> # Activate connection pools
>> # (change requires restart)
>>
>> # Semicolon separated list of queries
>> # to be issued at the end of a session
>> # The default is for 8.3 and later
>> reset_query_list = 'ABORT; DISCARD ALL'
>> # The following one is for 8.2 and before
>> #reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
>>
>>
>> #------------------------------------------------------------------------------
>> # REPLICATION MODE
>> #------------------------------------------------------------------------------
>>
>> replicate_select = off
>> # Replicate SELECT statements
>> # when in replication mode
>> # replicate_select is higher priority than
>> # load_balance_mode.
>>
>> insert_lock = off
>> # Automatically locks a dummy row or a table
>> # with INSERT statements to keep SERIAL data
>> # consistency
>> # Without SERIAL, no lock will be issued
>> lobj_lock_table = ''
>> # When rewriting lo_creat command in
>> # replication mode, specify table name to
>> # lock
>>
>> # - Degenerate handling -
>>
>> replication_stop_on_mismatch = off
>> # On disagreement with the packet kind
>> # sent from backend, degenerate the node
>> # which is most likely "minority"
>> # If off, just force to exit this session
>>
>> failover_if_affected_tuples_mismatch = off
>> # On disagreement with the number of affected
>> # tuples in UPDATE/DELETE queries, then
>> # degenerate the node which is most likely
>> # "minority".
>> # If off, just abort the transaction to
>> # keep the consistency
>>
>>
>> #------------------------------------------------------------------------------
>> # LOAD BALANCING MODE
>> #------------------------------------------------------------------------------
>>
>> load_balance_mode = on
>> # Activate load balancing mode
>> # (change requires restart)
>> ignore_leading_white_space = on
>> # Ignore leading white spaces of each query
>> read_only_function_list = ''
>> # Comma separated list of function names
>> # that don't write to database
>> # Regexp are accepted
>> write_function_list = ''
>> # Comma separated list of function names
>> # that write to database
>> # Regexp are accepted
>> # If both read_only_function_list and write_function_list
>> # is empty, function's volatile property is checked.
>> # If it's volatile, the function is regarded as a
>> # writing function.
>>
>> primary_routing_query_pattern_list = ''
>> # Semicolon separated list of query patterns
>> # that should be sent to primary node
>> # Regexp are accepted
>> # valid for streaming replicaton mode only.
>>
>> database_redirect_preference_list = ''
>> # comma separated list of pairs of database and node id.
>> # example: postgres:primary,mydb[0-4]:1,mydb[5-9]:2'
>> # valid for streaming replicaton mode only.
>>
>> app_name_redirect_preference_list = ''
>> # comma separated list of pairs of app name and node id.
>> # example: 'psql:primary,myapp[0-4]:1,myapp[5-9]:standby'
>> # valid for streaming replicaton mode only.
>> allow_sql_comments = off
>> # if on, ignore SQL comments when judging if load balance or
>> # query cache is possible.
>> # If off, SQL comments effectively prevent the judgment
>> # (pre 3.4 behavior).
>>
>> disable_load_balance_on_write = 'transaction'
>> # Load balance behavior when write query is issued
>> # in an explicit transaction.
>> #
>> # Valid values:
>> #
>> # 'transaction' (default):
>> # if a write query is issued, subsequent
>> # read queries will not be load balanced
>> # until the transaction ends.
>> #
>> # 'trans_transaction':
>> # if a write query is issued, subsequent
>> # read queries in an explicit transaction
>> # will not be load balanced until the session ends.
>> #
>> # 'dml_adaptive':
>> # Queries on the tables that have already been
>> # modified within the current explicit transaction will
>> # not be load balanced until the end of the transaction.
>> #
>> # 'always':
>> # if a write query is issued, read queries will
>> # not be load balanced until the session ends.
>> #
>> # Note that any query not in an explicit transaction
>> # is not affected by the parameter.
>>
>> dml_adaptive_object_relationship_list= ''
>> # comma separated list of object pairs
>> # [object]:[dependent-object], to disable load balancing
>> # of dependent objects within the explicit transaction
>> # after WRITE statement is issued on (depending-on) object.
>> #
>> # example: 'tb_t1:tb_t2,insert_tb_f_func():tb_f,tb_v:my_view'
>> # Note: function name in this list must also be present in
>> # the write_function_list
>> # only valid for disable_load_balance_on_write = 'dml_adaptive'.
>>
>> statement_level_load_balance = off
>> # Enables statement level load balancing
>>
>> #------------------------------------------------------------------------------
>> # NATIVE REPLICATION MODE
>> #------------------------------------------------------------------------------
>>
>> # - Streaming -
>>
>> sr_check_period = 10
>> # Streaming replication check period
>> # Disabled (0) by default
>> sr_check_user = 'repmgr'
>> # Streaming replication check user
>> # This is neccessary even if you disable streaming
>> # replication delay check by sr_check_period = 0
>> sr_check_password = '###################'
>> # Password for streaming replication check user
>> # Leaving it empty will make Pgpool-II to first look for the
>> # Password in pool_passwd file before using the empty password
>>
>> sr_check_database = 'repmgr'
>> # Database name for streaming replication check
>> delay_threshold = 10000000
>> # Threshold before not dispatching query to standby node
>> # Unit is in bytes
>> # Disabled (0) by default
>>
>> # - Special commands -
>>
>> follow_primary_command = ''
>> # Executes this command after main node failover
>> # Special values:
>> # %d = failed node id
>> # %h = failed node host name
>> # %p = failed node port number
>> # %D = failed node database cluster path
>> # %m = new main node id
>> # %H = new main node hostname
>> # %M = old main node id
>> # %P = old primary node id
>> # %r = new main port number
>> # %R = new main database cluster path
>> # %N = old primary node hostname
>> # %S = old primary node port number
>> # %% = '%' character
>>
>> #------------------------------------------------------------------------------
>> # HEALTH CHECK GLOBAL PARAMETERS
>> #------------------------------------------------------------------------------
>>
>> health_check_period = 5
>> # Health check period
>> # Disabled (0) by default
>> health_check_timeout = 20
>> # Health check timeout
>> # 0 means no timeout
>> health_check_user = 'pgpool'
>> # Health check user
>> health_check_password = '#############################'
>> # Password for health check user
>> # Leaving it empty will make Pgpool-II to first look for the
>> # Password in pool_passwd file before using the empty password
>>
>> health_check_database = 'postgres'
>> # Database name for health check. If '', tries 'postgres' frist,
>> health_check_max_retries = 3
>> # Maximum number of times to retry a failed health check before giving up.
>> health_check_retry_delay = 1
>> # Amount of time to wait (in seconds) between retries.
>> connect_timeout = 10000
>> # Timeout value in milliseconds before giving up to connect to backend.
>> # Default is 10000 ms (10 second). Flaky network user may want to increase
>> # the value. 0 means no timeout.
>> # Note that this value is not only used for health check,
>> # but also for ordinary conection to backend.
>>
>> #------------------------------------------------------------------------------
>> # HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)
>> #------------------------------------------------------------------------------
>> #health_check_period0 = 0
>> #health_check_timeout0 = 20
>> #health_check_user0 = 'nobody'
>> #health_check_password0 = ''
>> #health_check_database0 = ''
>> #health_check_max_retries0 = 0
>> #health_check_retry_delay0 = 1
>> #connect_timeout0 = 10000
>>
>> #------------------------------------------------------------------------------
>> # FAILOVER AND FAILBACK
>> #------------------------------------------------------------------------------
>>
>> #failover_command = '/opt/pgpool/scripts/failover.sh %d %h %p %D %m %H %M %P %r %R'
>> failover_command = '/etc/pgpool-II/failover.sh %d %H %h %p %D %m %M %P %r %R %N %S'
>> # Executes this command at failover
>> # Special values:
>> # %d = failed node id
>> # %h = failed node host name
>> # %p = failed node port number
>> # %D = failed node database cluster path
>> # %m = new main node id
>> # %H = new main node hostname
>> # %M = old main node id
>> # %P = old primary node id
>> # %r = new main port number
>> # %R = new main database cluster path
>> # %N = old primary node hostname
>> # %S = old primary node port number
>> # %% = '%' character
>> failback_command = ''
>> # Executes this command at failback.
>> # Special values:
>> # %d = failed node id
>> # %h = failed node host name
>> # %p = failed node port number
>> # %D = failed node database cluster path
>> # %m = new main node id
>> # %H = new main node hostname
>> # %M = old main node id
>> # %P = old primary node id
>> # %r = new main port number
>> # %R = new main database cluster path
>> # %N = old primary node hostname
>> # %S = old primary node port number
>> # %% = '%' character
>>
>> failover_on_backend_error = on
>> # Initiates failover when reading/writing to the
>> # backend communication socket fails
>> # If set to off, pgpool will report an
>> # error and disconnect the session.
>>
>> detach_false_primary = on
>> # Detach false primary if on. Only
>> # valid in streaming replicaton
>> # mode and with PostgreSQL 9.6 or
>> # after.
>>
>> search_primary_node_timeout = 5min
>> # Timeout in seconds to search for the
>> # primary node when a failover occurs.
>> # 0 means no timeout, keep searching
>> # for a primary node forever.
>>
>> #------------------------------------------------------------------------------
>> # ONLINE RECOVERY
>> #------------------------------------------------------------------------------
>>
>> recovery_user = 'nobody'
>> # Online recovery user
>> recovery_password = ''
>> # Online recovery password
>> # Leaving it empty will make Pgpool-II to first look for the
>> # Password in pool_passwd file before using the empty password
>>
>> recovery_1st_stage_command = ''
>> # Executes a command in first stage
>> recovery_2nd_stage_command = ''
>> # Executes a command in second stage
>> recovery_timeout = 90
>> # Timeout in seconds to wait for the
>> # recovering node's postmaster to start up
>> # 0 means no wait
>> client_idle_limit_in_recovery = 0
>> # Client is disconnected after being idle
>> # for that many seconds in the second stage
>> # of online recovery
>> # 0 means no disconnection
>> # -1 means immediate disconnection
>>
>> auto_failback = on
>> # Dettached backend node reattach automatically
>> # if replication_state is 'streaming'.
>> auto_failback_interval = 1min
>> # Min interval of executing auto_failback in
>> # seconds.
>>
>> #------------------------------------------------------------------------------
>> # WATCHDOG
>> #------------------------------------------------------------------------------
>>
>> # - Enabling -
>>
>> use_watchdog = on
>> # Activates watchdog
>> # (change requires restart)
>>
>> # -Connection to up stream servers -
>>
>> trusted_servers = ''
>> # trusted server list which are used
>> # to confirm network connection
>> # (hostA,hostB,hostC,...)
>> # (change requires restart)
>> ping_path = '/bin'
>> # ping command path
>> # (change requires restart)
>>
>> # - Watchdog communication Settings -
>>
>> hostname0 = '192.168.40.66'
>> # Host name or IP address of pgpool node
>> # for watchdog connection
>> # (change requires restart)
>> wd_port0 = 9000
>> # Port number for watchdog service
>> # (change requires restart)
>> pgpool_port0 = 9999
>> # Port number for pgpool
>> # (change requires restart)
>>
>>
>> hostname1 = '192.168.40.67'
>> wd_port1 = 9000
>> pgpool_port1 = 9999
>>
>> hostname2 = '192.168.40.64'
>> wd_port2 = 9000
>> pgpool_port2 = 9999
>>
>>
>> wd_priority = 90
>> # priority of this watchdog in leader election
>> # (change requires restart)
>>
>> wd_authkey = '###################################'
>> # Authentication key for watchdog communication
>> # (change requires restart)
>>
>> wd_ipc_socket_dir = '/tmp'
>> # Unix domain socket path for watchdog IPC socket
>> # The Debian package defaults to
>> # /var/run/postgresql
>> # (change requires restart)
>>
>>
>> # - Virtual IP control Setting -
>>
>> delegate_IP = '192.168.40.70'
>> # delegate IP address
>> # If this is empty, virtual IP never bring up.
>> # (change requires restart)
>> if_cmd_path = '/sbin'
>> # path to the directory where if_up/down_cmd exists
>> # If if_up/down_cmd starts with "/", if_cmd_path will be ignored.
>> # (change requires restart)
>> if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
>> # startup delegate IP command
>> # (change requires restart)
>> if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
>> # shutdown delegate IP command
>> # (change requires restart)
>> arping_path = '/usr/sbin'
>> # arping command path
>> # If arping_cmd starts with "/", if_cmd_path will be ignored.
>> # (change requires restart)
>> arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
>> # arping command
>> # (change requires restart)
>>
>> # - Behaivor on escalation Setting -
>>
>> clear_memqcache_on_escalation = on
>> # Clear all the query cache on shared memory
>> # when standby pgpool escalate to active pgpool
>> # (= virtual IP holder).
>> # This should be off if client connects to pgpool
>> # not using virtual IP.
>> # (change requires restart)
>> wd_escalation_command = ''
>> # Executes this command at escalation on new active pgpool.
>> # (change requires restart)
>> wd_de_escalation_command = ''
>> # Executes this command when leader pgpool resigns from being leader.
>> # (change requires restart)
>>
>> # - Watchdog consensus settings for failover -
>>
>> failover_when_quorum_exists = on
>> # Only perform backend node failover
>> # when the watchdog cluster holds the quorum
>> # (change requires restart)
>>
>> failover_require_consensus = on
>> # Perform failover when majority of Pgpool-II nodes
>> # aggrees on the backend node status change
>> # (change requires restart)
>>
>> allow_multiple_failover_requests_from_node = off
>> # A Pgpool-II node can cast multiple votes
>> # for building the consensus on failover
>> # (change requires restart)
>>
>>
>> enable_consensus_with_half_votes = off
>> # apply majority rule for consensus and quorum computation
>> # at 50% of votes in a cluster with even number of nodes.
>> # when enabled the existence of quorum and consensus
>> # on failover is resolved after receiving half of the
>> # total votes in the cluster, otherwise both these
>> # decisions require at least one more vote than
>> # half of the total votes.
>> # (change requires restart)
>>
>> # - Lifecheck Setting -
>>
>> # -- common --
>>
>> wd_monitoring_interfaces_list = 'any' # Comma separated list of interfaces names to monitor.
>> # if any interface from the list is active the watchdog will
>> # consider the network is fine
>> # 'any' to enable monitoring on all interfaces except loopback
>> # '' to disable monitoring
>> # (change requires restart)
>>
>> wd_lifecheck_method = 'heartbeat'
>> # Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')
>> # (change requires restart)
>> wd_interval = 10
>> # lifecheck interval (sec) > 0
>> # (change requires restart)
>>
>> # -- heartbeat mode --
>>
>> heartbeat_hostname0 = '192.168.40.66'
>> # Host name or IP address used
>> # for sending heartbeat signal.
>> # (change requires restart)
>> heartbeat_port0 = 9694
>> # Port number used for receiving/sending heartbeat signal
>> # Usually this is the same as heartbeat_portX.
>> # (change requires restart)
>> heartbeat_device0 = 'eth0'
>> # Name of NIC device (such like 'eth0')
>> # used for sending/receiving heartbeat
>> # signal to/from destination 0.
>> # This works only when this is not empty
>> # and pgpool has root privilege.
>> # (change requires restart)
>>
>> heartbeat_hostname1 = '192.168.40.67'
>> heartbeat_port1 = 9694
>> heartbeat_device1 = 'eth0'
>>
>> heartbeat_hostname2 = '192.168.40.64'
>> heartbeat_port2 = 9694
>> heartbeat_device2 = 'eth0'
>>
>> wd_heartbeat_keepalive = 2
>> # Interval time of sending heartbeat signal (sec)
>> # (change requires restart)
>> wd_heartbeat_deadtime = 30
>> # Deadtime interval for heartbeat signal (sec)
>> # (change requires restart)
>>
>> # -- query mode --
>>
>> wd_life_point = 3
>> # lifecheck retry times
>> # (change requires restart)
>> wd_lifecheck_query = 'SELECT 1'
>> # lifecheck query to pgpool from watchdog
>> # (change requires restart)
>> wd_lifecheck_dbname = 'template1'
>> # Database name connected for lifecheck
>> # (change requires restart)
>> wd_lifecheck_user = 'nobody'
>> # watchdog user monitoring pgpools in lifecheck
>> # (change requires restart)
>> wd_lifecheck_password = ''
>> # Password for watchdog user in lifecheck
>> # Leaving it empty will make Pgpool-II to first look for the
>> # Password in pool_passwd file before using the empty password
>> # (change requires restart)
>>
>> #------------------------------------------------------------------------------
>> # OTHERS
>> #------------------------------------------------------------------------------
>> relcache_expire = 0
>> # Life time of relation cache in seconds.
>> # 0 means no cache expiration(the default).
>> # The relation cache is used for cache the
>> # query result against PostgreSQL system
>> # catalog to obtain various information
>> # including table structures or if it's a
>> # temporary table or not. The cache is
>> # maintained in a pgpool child local memory
>> # and being kept as long as it survives.
>> # If someone modify the table by using
>> # ALTER TABLE or some such, the relcache is
>> # not consistent anymore.
>> # For this purpose, cache_expiration
>> # controls the life time of the cache.
>> relcache_size = 256
>> # Number of relation cache
>> # entry. If you see frequently:
>> # "pool_search_relcache: cache replacement happend"
>> # in the pgpool log, you might want to increate this number.
>>
>> check_temp_table = catalog
>> # Temporary table check method. catalog, trace or none.
>> # Default is catalog.
>>
>> check_unlogged_table = on
>> # If on, enable unlogged table check in SELECT statements.
>> # This initiates queries against system catalog of primary/main
>> # thus increases load of primary.
>> # If you are absolutely sure that your system never uses unlogged tables
>> # and you want to save access to primary/main, you could turn this off.
>> # Default is on.
>> enable_shared_relcache = on
>> # If on, relation cache stored in memory cache,
>> # the cache is shared among child process.
>> # Default is on.
>> # (change requires restart)
>>
>> relcache_query_target = primary # Target node to send relcache queries. Default is primary node.
>> # If load_balance_node is specified, queries will be sent to load balance node.
>> #------------------------------------------------------------------------------
>> # IN MEMORY QUERY MEMORY CACHE
>> #------------------------------------------------------------------------------
>> memory_cache_enabled = off
>> # If on, use the memory cache functionality, off by default
>> # (change requires restart)
>> memqcache_method = 'shmem'
>> # Cache storage method. either 'shmem'(shared memory) or
>> # 'memcached'. 'shmem' by default
>> # (change requires restart)
>> memqcache_memcached_host = 'localhost'
>> # Memcached host name or IP address. Mandatory if
>> # memqcache_method = 'memcached'.
>> # Defaults to localhost.
>> # (change requires restart)
>> memqcache_memcached_port = 11211
>> # Memcached port number. Mondatory if memqcache_method = 'memcached'.
>> # Defaults to 11211.
>> # (change requires restart)
>> memqcache_total_size = 64MB
>> # Total memory size in bytes for storing memory cache.
>> # Mandatory if memqcache_method = 'shmem'.
>> # Defaults to 64MB.
>> # (change requires restart)
>> memqcache_max_num_cache = 1000000
>> # Total number of cache entries. Mandatory
>> # if memqcache_method = 'shmem'.
>> # Each cache entry consumes 48 bytes on shared memory.
>> # Defaults to 1,000,000(45.8MB).
>> # (change requires restart)
>> memqcache_expire = 0
>> # Memory cache entry life time specified in seconds.
>> # 0 means infinite life time. 0 by default.
>> # (change requires restart)
>> memqcache_auto_cache_invalidation = on
>> # If on, invalidation of query cache is triggered by corresponding
>> # DDL/DML/DCL(and memqcache_expire). If off, it is only triggered
>> # by memqcache_expire. on by default.
>> # (change requires restart)
>> memqcache_maxcache = 400kB
>> # Maximum SELECT result size in bytes.
>> # Must be smaller than memqcache_cache_block_size. Defaults to 400KB.
>> # (change requires restart)
>> memqcache_cache_block_size = 1MB
>> # Cache block size in bytes. Mandatory if memqcache_method = 'shmem'.
>> # Defaults to 1MB.
>> # (change requires restart)
>> memqcache_oiddir = '/var/log/pgpool/oiddir'
>> # Temporary work directory to record table oids
>> # (change requires restart)
>> cache_safe_memqcache_table_list = ''
>> # Comma separated list of table names to memcache
>> # that don't write to database
>> # Regexp are accepted
>> cache_unsafe_memqcache_table_list = ''
>> # Comma separated list of table names not to memcache
>> # that don't write to database
>> # Regexp are accepted
>>
>> Error output:
>>
>> Dec 17 18:39:49 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:49: pid 1332675: LOG: setting the local watchdog node name to "192.168.40.67:9999 Linux SVD-SLB02"
>> Dec 17 18:39:49 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:49: pid 1332675: LOG: watchdog cluster is configured with 2 remote nodes
>> Dec 17 18:39:49 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:49: pid 1332675: LOG: watchdog remote node:0 on 192.168.40.66:9000
>> Dec 17 18:39:49 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:49: pid 1332675: LOG: watchdog remote node:1 on 192.168.40.64:9000
>> Dec 17 18:39:49 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:49: pid 1332675: LOG: ensure availibility on any interface
>> Dec 17 18:39:49 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:49: pid 1332675: LOG: watchdog node state changed from [DEAD] to [LOADING]
>> Dec 17 18:39:49 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:49: pid 1332675: LOG: new outbound connection to 192.168.40.64:9000
>> Dec 17 18:39:50 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:50: pid 1332675: LOG: new watchdog node connection is received from "192.168.40.66:62151"
>> Dec 17 18:39:50 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:50: pid 1332675: LOG: new node joined the cluster hostname:"192.168.40.66" port:9000 pgpool_port:9999
>> Dec 17 18:39:50 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:50: pid 1332675: DETAIL: Pgpool-II version:"4.2.0" watchdog messaging version: 1.2
>> Dec 17 18:39:53 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:53: pid 1332675: LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
>> Dec 17 18:39:54 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:54: pid 1332675: LOG: watchdog node state changed from [INITIALIZING] to [STANDING FOR LEADER]
>> Dec 17 18:39:54 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:54: pid 1332675: LOG: watchdog node state changed from [STANDING FOR LEADER] to [PARTICIPATING IN ELECTION]
>> Dec 17 18:39:54 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:54: pid 1332675: LOG: watchdog node state changed from [PARTICIPATING IN ELECTION] to [INITIALIZING]
>> Dec 17 18:39:54 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:54: pid 1332675: LOG: setting the remote node "192.168.40.66:9999 Linux SVD-SLB01" as watchdog cluster leader
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: LOG: successfully joined the watchdog cluster as standby node
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: DETAIL: our join coordinator request is accepted by cluster leader node "192.168.40.66:9999 Linux SVD-SLB01"
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: LOG: new IPC connection received
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: LOG: new IPC connection received
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: LOG: new IPC connection received
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: LOG: new IPC connection received
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: LOG: received the get data request from local pgpool-II on IPC interface
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: LOG: get data request from local pgpool-II node received on IPC interface is forwarded to leader watchdog node "192.168.40.66:9999 Linux SVD-SLB01"
>> Dec 17 18:39:55 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:55: pid 1332675: DETAIL: waiting for the reply...
>> Dec 17 18:39:59 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:59: pid 1332675: LOG: new watchdog node connection is received from "192.168.40.64:15019"
>> Dec 17 18:39:59 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:59: pid 1332675: LOG: new node joined the cluster hostname:"192.168.40.64" port:9000 pgpool_port:9999
>> Dec 17 18:39:59 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:39:59: pid 1332675: DETAIL: Pgpool-II version:"4.2.0" watchdog messaging version: 1.2
>> Dec 17 18:40:00 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:00: pid 1332675: LOG: new outbound connection to 192.168.40.66:9000
>> Dec 17 18:40:44 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:44: pid 1332675: LOG: remote node "192.168.40.66:9999 Linux SVD-SLB01" is shutting down
>> Dec 17 18:40:44 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:44: pid 1332675: LOG: watchdog cluster has lost the coordinator node
>> Dec 17 18:40:44 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:44: pid 1332675: LOG: removing the remote node "192.168.40.66:9999 Linux SVD-SLB01" from watchdog cluster leader
>> Dec 17 18:40:44 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:44: pid 1332675: LOG: We have lost the cluster leader node "192.168.40.66:9999 Linux SVD-SLB01"
>> Dec 17 18:40:44 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:44: pid 1332675: LOG: watchdog node state changed from [STANDBY] to [JOINING]
>> Dec 17 18:40:44 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:44: pid 1332675: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: LOG: watchdog node state changed from [INITIALIZING] to [STANDING FOR LEADER]
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: LOG: watchdog node state changed from [STANDING FOR LEADER] to [LEADER]
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: LOG: I am announcing my self as leader/coordinator watchdog node
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: LOG: I am the cluster leader node
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: DETAIL: our declare coordinator message is accepted by all nodes
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: LOG: setting the local node "192.168.40.67:9999 Linux SVD-SLB02" as watchdog cluster leader
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: LOG: I am the cluster leader node but we do not have enough nodes in cluster
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: DETAIL: waiting for the quorum to start escalation process
>> Dec 17 18:40:45 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:45: pid 1332675: LOG: new IPC connection received
>> Dec 17 18:40:46 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:46: pid 1332675: LOG: adding watchdog node "192.168.40.64:9999 Linux SVD-WEB01" to the standby list
>> Dec 17 18:40:46 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:46: pid 1332675: LOG: quorum found
>> Dec 17 18:40:46 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:46: pid 1332675: DETAIL: starting escalation process
>> Dec 17 18:40:46 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:46: pid 1332675: LOG: escalation process started with PID:1332782
>> Dec 17 18:40:46 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:46: pid 1332675: LOG: new IPC connection received
>> Dec 17 18:40:46 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:46: pid 1332675: LOG: new IPC connection received
>> Dec 17 18:40:50 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:40:50: pid 1332675: LOG: watchdog escalation process with pid: 1332782 exit with SUCCESS.
>> Dec 17 18:41:03 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:41:03: pid 1332675: LOG: new watchdog node connection is received from "192.168.40.66:55496"
>> Dec 17 18:41:03 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:41:03: pid 1332675: LOG: new node joined the cluster hostname:"192.168.40.66" port:9000 pgpool_port:9999
>> Dec 17 18:41:03 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:41:03: pid 1332675: DETAIL: Pgpool-II version:"4.2.0" watchdog messaging version: 1.2
>> Dec 17 18:41:03 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:41:03: pid 1332675: LOG: The newly joined node:"192.168.40.66:9999 Linux SVD-SLB01" had left the cluster because it was shutdown
>> Dec 17 18:41:03 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:41:03: pid 1332675: LOG: new outbound connection to 192.168.40.66:9000
>> Dec 17 18:41:04 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:41:04: pid 1332675: LOG: adding watchdog node "192.168.40.66:9999 Linux SVD-SLB01" to the standby list
>> Dec 17 18:42:51 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:42:51: pid 1332675: LOG: read from socket failed, remote end closed the connection
>> Dec 17 18:42:51 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:42:51: pid 1332675: LOG: client socket of 192.168.40.66:9999 Linux SVD-SLB01 is closed
>> Dec 17 18:42:51 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:42:51: pid 1332675: LOG: read from socket failed, remote end closed the connection
>> Dec 17 18:42:51 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:42:51: pid 1332675: LOG: outbound socket of 192.168.40.66:9999 Linux SVD-SLB01 is closed
>> Dec 17 18:42:51 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:42:51: pid 1332675: LOG: remote node "192.168.40.66:9999 Linux SVD-SLB01" is not reachable
>> Dec 17 18:42:51 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:42:51: pid 1332675: DETAIL: marking the node as lost
>> Dec 17 18:42:51 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:42:51: pid 1332675: LOG: remote node "192.168.40.66:9999 Linux SVD-SLB01" is lost
>> Dec 17 18:42:51 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:42:51: pid 1332675: LOG: removing watchdog node "192.168.40.66:9999 Linux SVD-SLB01" from the standby list
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: new IPC connection received
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: read from socket failed, remote end closed the connection
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: client socket of 192.168.40.64:9999 Linux SVD-WEB01 is closed
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: remote node "192.168.40.64:9999 Linux SVD-WEB01" is shutting down
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: removing watchdog node "192.168.40.64:9999 Linux SVD-WEB01" from the standby list
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: We have lost the quorum
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: received node status change ipc message
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: DETAIL: No heartbeat signal from node
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: WARNING: watchdog life-check reported, we are disconnected from the network
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: DETAIL: changing the state to LOST
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: watchdog node state changed from [LEADER] to [LOST]
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: FATAL: system has lost the network
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332675: LOG: Watchdog is shutting down
>> Dec 17 18:43:25 SVD-SLB02 pgpool[1332673]: 2020-12-17 18:43:25: pid 1332673: LOG: watchdog child process with pid: 1332675 exits with status 768
>>
>> Does anyone have any suggestions of what this could be?
>>
>> Note if I play around with the weights I can get the other node to be the VIP but it still shutdowns with node 0 is shutdown.
>>
>> It does not shutdown when any of the other nodes are shutdown, only node 0
>>
>> Thanks,
>>
>> Joe
>>
More information about the pgpool-general
mailing list