<div dir="ltr"><div>Hi Tatsuo</div><div><br></div><div>Yeah, it was the reason. Great thanks!!!</div><div>Removing the /var/log/pgpool_status solves the problem. </div><div><br></div><div>The only remark that as I understand, (at least on pgpool-4.2) a pcp </div><div>port/socket becomes available only after successful backend </div><div>initialization. So, in my case without the initialized primary backend </div><div>all pcp commands didn't work. <br></div><div><br></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>BR</div><div>Igor Yurchenko<br></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 15 Jul 2024 at 06:51, Tatsuo Ishii <<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
My guess is, pgpool_status (should be placed under logdir) file is set<br>
to:<br>
<br>
down<br>
up<br>
<br>
i.e. pgpool thinks that the PostgreSQL primary node is detached from<br>
pgpool. If you are sure that the PostgreSQL primary is healthy, you<br>
can attach the primary node by using pcp_attached_node. Alternative<br>
method is, stopping pgpool; removing the pgpool_status file;<br>
restarting pgpool. pgpool will automatically create pgpool_status file<br>
and mark the primary up if it is up and running.<br>
<br>
Best reagards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS LLC<br>
English: <a href="http://www.sraoss.co.jp/index_en/" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en/</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
<br>
<br>
> Hi guys<br>
> I have given up and it looks like I need your help.<br>
> <br>
> A setup:<br>
> I have a setup with 2 hosts of Postgresql15 stream cluster (IPs:<br>
> 10.65.188.55, 10.65.188.56 respectively) with a 2 Pgpool 4.2.15 instances<br>
> installed in the same hosts.<br>
> The pgpool instances manage/delegate on a VIP 10.65.188.59.<br>
> <br>
> A replication between pg15 instances work well and the instances recovery<br>
> status detected reliably:<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
> fabrix -h 10.65.188.55 -p 5432 template1 -c 'select pg_is_in_recovery()'<br>
>  pg_is_in_recovery<br>
> -------------------<br>
>  f<br>
> (1 row)<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
> fabrix -h 10.65.188.56 -p 5432 template1 -c 'select pg_is_in_recovery()'<br>
>  pg_is_in_recovery<br>
> -------------------<br>
>  t<br>
> (1 row)<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
> fabrix -h 10.65.188.55 -p 5432 template1 -c 'select * from<br>
> pg_stat_replication'<br>
>   pid  | usesysid | usename | application_name | client_addr  |<br>
> client_hostname | client_port |         backend_start         |<br>
> backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |<br>
> write_lag | f<br>
> lush_lag | replay_lag | sync_priority | sync_state |          reply_time<br>
> -------+----------+---------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+--<br>
> ---------+------------+---------------+------------+-------------------------------<br>
>  13875 |    16384 | fabrix  | walreceiver      | 10.65.188.56 |<br>
>     |       47636 | 2024-07-14 14:30:54.344682+03 |          743 |<br>
> streaming | 0/B0001C0 | 0/B0001C0 | 0/B0001C0 | 0/B0001C0  |           |<br>
>          |            |             0 | async      | 2024-07-14<br>
> 14:48:41.550424+03<br>
> (1 row)<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#<br>
> <br>
> The databases of the pg15 cluster are reachable/available via VIP:<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#<br>
> [root@pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
> fabrix -h 10.65.188.59 -p 9999 manager -c '\dt'<br>
>                  List of relations<br>
>  Schema |         Name          | Type  |  Owner<br>
> --------+-----------------------+-------+----------<br>
>  public | upgrade_version_table | table | postgres<br>
> (1 row)<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
> fabrix -h 10.65.188.59 -p 9999 manager -c 'select * from<br>
> upgrade_version_table'<br>
>     app_name    | cur_version | upgrade_to<br>
> ----------------+-------------+------------<br>
>  brODtv Manager | 10.2.0.300  |<br>
> (1 row)<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#<br>
> <br>
> But, the pgpool cannot detect the cluster primary node, and provides<br>
> incorrect pool_nodes info:<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
> fabrix -h 10.65.188.59 -p 9999 manager -c 'show pool_nodes'<br>
>  node_id |   hostname   | port | status | lb_weight |  role   | select_cnt<br>
> | load_balance_node | replication_delay | replication_state |<br>
> replication_sync_state | last_status_change<br>
> ---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------<br>
>  0       | 10.65.188.55 | 5432 | down   | 0.500000  | standby | 0<br>
>  | false             | 0                 |                   |<br>
>            | 2024-07-14 15:04:01<br>
>  1       | 10.65.188.56 | 5432 | up     | 0.500000  | standby | 3<br>
>  | true              | 0                 |                   |<br>
>            | 2024-07-14 15:04:01<br>
> (2 rows)<br>
> <br>
> [root@pg-mgrdb1 pgpool-II]#<br>
> <br>
> Right now, I'm testing it with a "single pgpool up" configuration (pgpool<br>
> on the second host is down). The  pgpool log shows that the pgpool doesn't<br>
> even try to touch the first (current primary) backend at <a href="http://10.65.188.55:5432" rel="noreferrer" target="_blank">10.65.188.55:5432</a>.<br>
> It successfully connects to the second 'standby' node and detects its<br>
> status. And this second node handles successfully all SELECT queries due to<br>
> load balancing configured. But what's wrong with first primary node?<br>
> <br>
> In the pgpool log I don't see any tries to connect to the first node. The<br>
> piece of the log where "find_primary_node" mentioned first looks so:<br>
> <br>
> 2024-07-14 15:23:38: pid 18723: LOG:  2 watchdog nodes are configured for<br>
> lifecheck<br>
> 2024-07-14 15:23:38: pid 18723: LOCATION:  wd_lifecheck.c:495<br>
> 2024-07-14 15:23:38: pid 18723: LOG:  watchdog nodes ID:0 Name:"<br>
> <a href="http://10.65.188.55:9999" rel="noreferrer" target="_blank">10.65.188.55:9999</a> Linux pg-mgrdb1"<br>
> 2024-07-14 15:23:38: pid 18723: DETAIL:  Host:"10.65.188.55" WD Port:9000<br>
> pgpool-II port:9999<br>
> 2024-07-14 15:23:38: pid 18723: LOCATION:  wd_lifecheck.c:503<br>
> 2024-07-14 15:23:38: pid 18723: LOG:  watchdog nodes ID:1 Name:"Not_Set"<br>
> 2024-07-14 15:23:38: pid 18723: DETAIL:  Host:"10.65.188.56" WD Port:9000<br>
> pgpool-II port:9999<br>
> 2024-07-14 15:23:38: pid 18723: LOCATION:  wd_lifecheck.c:503<br>
> 2024-07-14 15:23:38: pid 18706: LOG:  find_primary_node_repeatedly: waiting<br>
> for finding a primary node<br>
> 2024-07-14 15:23:38: pid 18706: LOCATION:  pgpool_main.c:3404<br>
> 2024-07-14 15:23:38: pid 18706: LOG:  find_primary_node: standby node is 1<br>
> 2024-07-14 15:23:38: pid 18706: LOCATION:  pgpool_main.c:3329<br>
> 2024-07-14 15:23:39: pid 18729: LOG:  set SO_REUSEPORT option to the socket<br>
> 2024-07-14 15:23:39: pid 18729: LOCATION:  wd_heartbeat.c:691<br>
> 2024-07-14 15:23:39: pid 18729: LOG:  creating watchdog heartbeat receive<br>
> socket.<br>
> 2024-07-14 15:23:39: pid 18729: DETAIL:  set SO_REUSEPORT<br>
> 2024-07-14 15:23:39: pid 18729: LOCATION:  wd_heartbeat.c:231<br>
> 2024-07-14 15:23:39: pid 18731: LOG:  set SO_REUSEPORT option to the socket<br>
> 2024-07-14 15:23:39: pid 18731: LOCATION:  wd_heartbeat.c:691<br>
> 2024-07-14 15:23:39: pid 18731: LOG:  creating socket for sending heartbeat<br>
> 2024-07-14 15:23:39: pid 18731: DETAIL:  set SO_REUSEPORT<br>
> 2024-07-14 15:23:39: pid 18731: LOCATION:  wd_heartbeat.c:148<br>
> 2024-07-14 15:23:39: pid 18706: LOG:  find_primary_node: standby node is 1<br>
> 2024-07-14 15:23:39: pid 18706: LOCATION:  pgpool_main.c:3329<br>
> 2024-07-14 15:23:40: pid 18706: LOG:  find_primary_node: standby node is 1<br>
> 2024-07-14 15:23:40: pid 18706: LOCATION:  pgpool_main.c:3329<br>
> 2024-07-14 15:23:41: pid 18706: LOG:  find_primary_node: standby node is 1<br>
> 2024-07-14 15:23:41: pid 18706: LOCATION:  pgpool_main.c:3329<br>
> <br>
> So I have no ideas anymore.<br>
> <br>
> The cleaned from comments and  sorted pgpool.conf looks so:<br>
> *********************************************************************************************<br>
> allow_clear_text_frontend_auth=off<br>
> allow_multiple_failover_requests_from_node=off<br>
> allow_sql_comments=off<br>
> app_name_redirect_preference_list=''<br>
> arping_cmd='/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'<br>
> arping_path='/usr/sbin'<br>
> authentication_timeout=1min<br>
> auto_failback_interval=1min<br>
> auto_failback=off<br>
> backend_application_name0=mgrdb1<br>
> backend_application_name1=mgrdb2<br>
> backend_clustering_mode='streaming_replication'<br>
> backend_data_directory0='/home/postgres/databases/fabrix2'<br>
> backend_data_directory1='/home/postgres/databases/fabrix2'<br>
> backend_flag0='ALLOW_TO_FAILOVER'<br>
> backend_flag1='ALLOW_TO_FAILOVER'<br>
> backend_hostname0='10.65.188.55'<br>
> backend_hostname1='10.65.188.56'<br>
> backend_port0=5432<br>
> backend_port1=5432<br>
> backend_weight0=1<br>
> backend_weight1=1<br>
> black_function_list='currval,lastval,nextval,setval'<br>
> cache_safe_memqcache_table_list=''<br>
> cache_unsafe_memqcache_table_list=''<br>
> check_temp_table=catalog<br>
> check_unlogged_table=on<br>
> child_life_time=1<br>
> child_max_connections=0<br>
> clear_memqcache_on_escalation=on<br>
> client_idle_limit=0<br>
> client_idle_limit_in_recovery=0<br>
> client_min_messages = debug1<br>
> connection_cache=on<br>
> connection_life_time=0<br>
> connect_timeout=10000<br>
> database_redirect_preference_list=''<br>
> delay_threshold=10000000<br>
> delegate_IP='10.65.188.59'<br>
> detach_false_primary=off<br>
> disable_load_balance_on_write='transaction'<br>
> dml_adaptive_object_relationship_list=''<br>
> enable_consensus_with_half_votes=on<br>
> enable_pool_hba=on<br>
> enable_shared_relcache=on<br>
> failback_command=''<br>
> failover_command='/etc/pgpool-II/recovery/failover.sh %d %P %H %R'<br>
> failover_if_affected_tuples_mismatch=off<br>
> failover_on_backend_error=on<br>
> failover_require_consensus=on<br>
> failover_when_quorum_exists=off<br>
> follow_master_command=''<br>
> follow_primary_command=''<br>
> health_check_database=''<br>
> health_check_max_retries=0<br>
> health_check_password='***************'<br>
> health_check_period=1<br>
> health_check_retry_delay=1<br>
> health_check_timeout=0<br>
> health_check_user='postgres'<br>
> heartbeat_device0='eth0'<br>
> heartbeat_device1='eth0'<br>
> heartbeat_hostname0='10.65.188.55'<br>
> heartbeat_hostname1='10.65.188.56'<br>
> heartbeat_port0=9694<br>
> heartbeat_port1=9694<br>
> hostname0='10.65.188.55'<br>
> hostname1='10.65.188.56'<br>
> if_cmd_path='/sbin'<br>
> if_down_cmd='/usr/bin/sudo /sbin/ip addr del $_IP_$/18 dev eth0'<br>
> if_up_cmd='/usr/bin/sudo /sbin/ip addr add $_IP_$/18 dev eth0 label eth0:pg'<br>
> ignore_leading_white_space=on<br>
> insert_lock=off<br>
> listen_addresses='*'<br>
> listen_backlog_multiplier=1<br>
> load_balance_mode=off<br>
> lobj_lock_table=''<br>
> log_client_messages=off<br>
> log_connections=off<br>
> log_destination='stderr'<br>
> log_directory='/var/log/pgpool_log'<br>
> logdir='/var/log'<br>
> log_disconnections=off<br>
> log_error_verbosity = verbose<br>
> log_filename='pgpool-%a.log'<br>
> logging_collector=on<br>
> log_hostname=off<br>
> log_line_prefix='%t: pid %p: '# printf-style string to output at beginning<br>
> of each log line.<br>
> log_min_messages = debug1<br>
> log_per_node_statement=off<br>
> log_rotation_age=1d<br>
> log_rotation_size=0<br>
> log_standby_delay='if_over_threshold'<br>
> log_statement=off<br>
> log_truncate_on_rotation=on<br>
> master_slave_mode=on<br>
> master_slave_sub_mode='stream'<br>
> max_pool=1<br>
> memory_cache_enabled=off<br>
> memqcache_auto_cache_invalidation=on<br>
> memqcache_cache_block_size=1MB<br>
> memqcache_expire=0<br>
> memqcache_maxcache=400kB<br>
> memqcache_max_num_cache=1000000<br>
> memqcache_memcached_host='localhost'<br>
> memqcache_memcached_port=11211<br>
> memqcache_method='shmem'<br>
> memqcache_oiddir='/var/log/pgpool/oiddir'<br>
> memqcache_total_size=64MB<br>
> num_init_children=160<br>
> pcp_listen_addresses='*'<br>
> pcp_port=9898<br>
> pcp_socket_dir='/var/run/postgresql'<br>
> pgpool_port0=9999<br>
> pgpool_port1=9999<br>
> pid_file_name='/var/run/pgpool.pid'<br>
> ping_path='/bin'<br>
> pool_passwd='pool_passwd'<br>
> port=9999<br>
> primary_routing_query_pattern_list=''<br>
> read_only_function_list=''<br>
> recovery_1st_stage_command='recovery_1st_stage'<br>
> recovery_2nd_stage_command=''<br>
> recovery_password='**************'<br>
> recovery_timeout=90<br>
> recovery_user='postgres'<br>
> relcache_expire=0<br>
> relcache_query_target=primary<br>
> relcache_size=256<br>
> replicate_select=off<br>
> replication_mode=off<br>
> replication_stop_on_mismatch=off<br>
> reserved_connections=0<br>
> reset_query_list='ABORT; DISCARD ALL'<br>
> search_primary_node_timeout=0<br>
> serialize_accept=off<br>
> socket_dir='/var/run/postgresql'<br>
> sr_check_database='postgres'<br>
> sr_check_password='fabrix'<br>
> sr_check_period=1<br>
> sr_check_user='****************'<br>
> ssl_ciphers='HIGH:MEDIUM:+3DES:!aNULL'<br>
> ssl_dh_params_file=''<br>
> ssl_ecdh_curve='prime256v1'<br>
> ssl=off<br>
> ssl_prefer_server_ciphers=off<br>
> statement_level_load_balance=off<br>
> syslog_facility='LOCAL0'<br>
> syslog_ident='pgpool'<br>
> trusted_servers=''<br>
> use_watchdog=on<br>
> wd_authkey=''<br>
> wd_de_escalation_command=''<br>
> wd_escalation_command=''<br>
> wd_heartbeat_deadtime=30<br>
> wd_heartbeat_keepalive=2<br>
> wd_interval=10<br>
> wd_ipc_socket_dir='/var/run/postgresql'<br>
> wd_lifecheck_dbname='template1'<br>
> wd_lifecheck_method='heartbeat'<br>
> wd_lifecheck_password=''<br>
> wd_lifecheck_query='SELECT 1'<br>
> wd_lifecheck_user='nobody'<br>
> wd_life_point=3<br>
> wd_monitoring_interfaces_list=''<br>
> wd_port0=9000<br>
> wd_port1=9000<br>
> wd_priority=1<br>
> write_function_list=''<br>
> *********************************************************************************************<br>
> <br>
> pool_hba.conf:<br>
> *********************************************************************************************<br>
> [root@pg-mgrdb1 pgpool_log]# cat /etc/pgpool-II/pool_hba.conf<br>
> # TYPE  DATABASE        USER    CIDR-ADDRESS    METHOD<br>
> local   all     all     trust<br>
> ## IPv4 local connections<br>
> host    all     all     <a href="http://127.0.0.1/32" rel="noreferrer" target="_blank">127.0.0.1/32</a>    md5<br>
> host    all     postgres        <a href="http://10.65.188.55/32" rel="noreferrer" target="_blank">10.65.188.55/32</a> md5<br>
> host    all     postgres        <a href="http://10.65.188.56/32" rel="noreferrer" target="_blank">10.65.188.56/32</a> md5<br>
> host    all     postgres        <a href="http://10.65.188.59/32" rel="noreferrer" target="_blank">10.65.188.59/32</a> md5<br>
> host    all     fabrix  <a href="http://10.65.188.55/32" rel="noreferrer" target="_blank">10.65.188.55/32</a> md5<br>
> host    all     fabrix  <a href="http://10.65.188.56/32" rel="noreferrer" target="_blank">10.65.188.56/32</a> md5<br>
> host    all     fabrix  <a href="http://10.65.188.59/32" rel="noreferrer" target="_blank">10.65.188.59/32</a> md5<br>
> ## Replication Hosts<br>
> host    manager fabrix  <a href="http://10.65.188.55/32" rel="noreferrer" target="_blank">10.65.188.55/32</a> md5<br>
> host    manager fabrix  <a href="http://10.65.188.56/32" rel="noreferrer" target="_blank">10.65.188.56/32</a> md5<br>
> host    manager fabrix  <a href="http://10.65.188.59/32" rel="noreferrer" target="_blank">10.65.188.59/32</a> md5<br>
> [root@pg-mgrdb1 pgpool_log]#<br>
> *********************************************************************************************<br>
> <br>
> pool_passwd:<br>
> *********************************************************************************************<br>
> [root@pg-mgrdb1 pgpool_log]# cat /etc/pgpool-II/pool_passwd<br>
> postgres:AESdLGx72XCAdQUybSno/SFuw==<br>
> fabrix:AES9Zd8L9XYGiZ5RMycfSFBFw==<br>
> [root@pg-mgrdb1 pgpool_log]#<br>
> *********************************************************************************************<br>
> <br>
> I'd highly appreciate any ideas or hints for further investigations.<br>
> <br>
> BR<br>
> Igor Yurchenko<br>
</blockquote></div>