<div dir="auto"><div>Yeah, you are right. <div dir="auto">One more question: in pg_enc documentation there mentioned that it takes a encryption key by default at /root/.pgpoolkey (with permissionts 0600). But pgpool started by pgpool.service as a user postgres. </div><div dir="auto"><br></div><div dir="auto">TBH, I don't understand how this case handled.</div><div dir="auto"><br></div>BR</div><div dir="auto">Igor Yurchenko<br><br><div class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Tue, 16 Jul 2024, 3:27 Tatsuo Ishii, <<a href="mailto:ishii@postgresql.org">ishii@postgresql.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> Hi Tatsuo<br>
> <br>
> Yeah, it was the reason. Great thanks!!!<br>
> Removing the /var/log/pgpool_status solves the problem.<br>
<br>
Glad to hear that!<br>
<br>
> The only remark that as I understand, (at least on pgpool-4.2) a pcp<br>
> port/socket becomes available only after successful backend<br>
> initialization. So, in my case without the initialized primary backend<br>
> all pcp commands didn't work.<br>
<br>
More precisely, pcp commands are available after searching the primary<br>
node is done. In your case, until the operation done or<br>
search_primary_node_timeout passed, pcp commands are not available<br>
because there's no primary node at the moment. Since the default<br>
value of search_primary_node_timeout is 5 minutes, you needed to wait<br>
for 5 minutes before pcp commands are available. You can make<br>
search_primary_node_timeout shorter if you like.<br>
<br>
Best reagards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS LLC<br>
English: <a href="http://www.sraoss.co.jp/index_en/" rel="noreferrer noreferrer" target="_blank">http://www.sraoss.co.jp/index_en/</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
<br>
> BR<br>
> Igor Yurchenko<br>
> <br>
> <br>
> On Mon, 15 Jul 2024 at 06:51, Tatsuo Ishii <<a href="mailto:ishii@postgresql.org" target="_blank" rel="noreferrer">ishii@postgresql.org</a>> wrote:<br>
> <br>
>> Hi,<br>
>><br>
>> My guess is, pgpool_status (should be placed under logdir) file is set<br>
>> to:<br>
>><br>
>> down<br>
>> up<br>
>><br>
>> i.e. pgpool thinks that the PostgreSQL primary node is detached from<br>
>> pgpool. If you are sure that the PostgreSQL primary is healthy, you<br>
>> can attach the primary node by using pcp_attached_node. Alternative<br>
>> method is, stopping pgpool; removing the pgpool_status file;<br>
>> restarting pgpool. pgpool will automatically create pgpool_status file<br>
>> and mark the primary up if it is up and running.<br>
>><br>
>> Best reagards,<br>
>> --<br>
>> Tatsuo Ishii<br>
>> SRA OSS LLC<br>
>> English: <a href="http://www.sraoss.co.jp/index_en/" rel="noreferrer noreferrer" target="_blank">http://www.sraoss.co.jp/index_en/</a><br>
>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
>><br>
>><br>
>> > Hi guys<br>
>> > I have given up and it looks like I need your help.<br>
>> ><br>
>> > A setup:<br>
>> > I have a setup with 2 hosts of Postgresql15 stream cluster (IPs:<br>
>> > 10.65.188.55, 10.65.188.56 respectively) with a 2 Pgpool 4.2.15 instances<br>
>> > installed in the same hosts.<br>
>> > The pgpool instances manage/delegate on a VIP 10.65.188.59.<br>
>> ><br>
>> > A replication between pg15 instances work well and the instances recovery<br>
>> > status detected reliably:<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]# PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
>> > fabrix -h 10.65.188.55 -p 5432 template1 -c 'select pg_is_in_recovery()'<br>
>> > pg_is_in_recovery<br>
>> > -------------------<br>
>> > f<br>
>> > (1 row)<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]# PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
>> > fabrix -h 10.65.188.56 -p 5432 template1 -c 'select pg_is_in_recovery()'<br>
>> > pg_is_in_recovery<br>
>> > -------------------<br>
>> > t<br>
>> > (1 row)<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]# PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
>> > fabrix -h 10.65.188.55 -p 5432 template1 -c 'select * from<br>
>> > pg_stat_replication'<br>
>> > pid | usesysid | usename | application_name | client_addr |<br>
>> > client_hostname | client_port | backend_start |<br>
>> > backend_xmin | state | sent_lsn | write_lsn | flush_lsn |<br>
>> replay_lsn |<br>
>> > write_lag | f<br>
>> > lush_lag | replay_lag | sync_priority | sync_state | reply_time<br>
>> ><br>
>> -------+----------+---------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+--<br>
>> ><br>
>> ---------+------------+---------------+------------+-------------------------------<br>
>> > 13875 | 16384 | fabrix | walreceiver | 10.65.188.56 |<br>
>> > | 47636 | 2024-07-14 14:30:54.344682+03 | 743 |<br>
>> > streaming | 0/B0001C0 | 0/B0001C0 | 0/B0001C0 | 0/B0001C0 | |<br>
>> > | | 0 | async | 2024-07-14<br>
>> > 14:48:41.550424+03<br>
>> > (1 row)<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]#<br>
>> ><br>
>> > The databases of the pg15 cluster are reachable/available via VIP:<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]#<br>
>> > [root@pg-mgrdb1 pgpool-II]# PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
>> > fabrix -h 10.65.188.59 -p 9999 manager -c '\dt'<br>
>> > List of relations<br>
>> > Schema | Name | Type | Owner<br>
>> > --------+-----------------------+-------+----------<br>
>> > public | upgrade_version_table | table | postgres<br>
>> > (1 row)<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]# PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
>> > fabrix -h 10.65.188.59 -p 9999 manager -c 'select * from<br>
>> > upgrade_version_table'<br>
>> > app_name | cur_version | upgrade_to<br>
>> > ----------------+-------------+------------<br>
>> > brODtv Manager | 10.2.0.300 |<br>
>> > (1 row)<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]#<br>
>> ><br>
>> > But, the pgpool cannot detect the cluster primary node, and provides<br>
>> > incorrect pool_nodes info:<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]# PGPASSWORD=fabrix /usr/pgsql-15/bin/psql -U<br>
>> > fabrix -h 10.65.188.59 -p 9999 manager -c 'show pool_nodes'<br>
>> > node_id | hostname | port | status | lb_weight | role |<br>
>> select_cnt<br>
>> > | load_balance_node | replication_delay | replication_state |<br>
>> > replication_sync_state | last_status_change<br>
>> ><br>
>> ---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------<br>
>> > 0 | 10.65.188.55 | 5432 | down | 0.500000 | standby | 0<br>
>> > | false | 0 | |<br>
>> > | 2024-07-14 15:04:01<br>
>> > 1 | 10.65.188.56 | 5432 | up | 0.500000 | standby | 3<br>
>> > | true | 0 | |<br>
>> > | 2024-07-14 15:04:01<br>
>> > (2 rows)<br>
>> ><br>
>> > [root@pg-mgrdb1 pgpool-II]#<br>
>> ><br>
>> > Right now, I'm testing it with a "single pgpool up" configuration (pgpool<br>
>> > on the second host is down). The pgpool log shows that the pgpool<br>
>> doesn't<br>
>> > even try to touch the first (current primary) backend at<br>
>> <a href="http://10.65.188.55:5432" rel="noreferrer noreferrer" target="_blank">10.65.188.55:5432</a>.<br>
>> > It successfully connects to the second 'standby' node and detects its<br>
>> > status. And this second node handles successfully all SELECT queries due<br>
>> to<br>
>> > load balancing configured. But what's wrong with first primary node?<br>
>> ><br>
>> > In the pgpool log I don't see any tries to connect to the first node. The<br>
>> > piece of the log where "find_primary_node" mentioned first looks so:<br>
>> ><br>
>> > 2024-07-14 15:23:38: pid 18723: LOG: 2 watchdog nodes are configured for<br>
>> > lifecheck<br>
>> > 2024-07-14 15:23:38: pid 18723: LOCATION: wd_lifecheck.c:495<br>
>> > 2024-07-14 15:23:38: pid 18723: LOG: watchdog nodes ID:0 Name:"<br>
>> > <a href="http://10.65.188.55:9999" rel="noreferrer noreferrer" target="_blank">10.65.188.55:9999</a> Linux pg-mgrdb1"<br>
>> > 2024-07-14 15:23:38: pid 18723: DETAIL: Host:"10.65.188.55" WD Port:9000<br>
>> > pgpool-II port:9999<br>
>> > 2024-07-14 15:23:38: pid 18723: LOCATION: wd_lifecheck.c:503<br>
>> > 2024-07-14 15:23:38: pid 18723: LOG: watchdog nodes ID:1 Name:"Not_Set"<br>
>> > 2024-07-14 15:23:38: pid 18723: DETAIL: Host:"10.65.188.56" WD Port:9000<br>
>> > pgpool-II port:9999<br>
>> > 2024-07-14 15:23:38: pid 18723: LOCATION: wd_lifecheck.c:503<br>
>> > 2024-07-14 15:23:38: pid 18706: LOG: find_primary_node_repeatedly:<br>
>> waiting<br>
>> > for finding a primary node<br>
>> > 2024-07-14 15:23:38: pid 18706: LOCATION: pgpool_main.c:3404<br>
>> > 2024-07-14 15:23:38: pid 18706: LOG: find_primary_node: standby node is<br>
>> 1<br>
>> > 2024-07-14 15:23:38: pid 18706: LOCATION: pgpool_main.c:3329<br>
>> > 2024-07-14 15:23:39: pid 18729: LOG: set SO_REUSEPORT option to the<br>
>> socket<br>
>> > 2024-07-14 15:23:39: pid 18729: LOCATION: wd_heartbeat.c:691<br>
>> > 2024-07-14 15:23:39: pid 18729: LOG: creating watchdog heartbeat receive<br>
>> > socket.<br>
>> > 2024-07-14 15:23:39: pid 18729: DETAIL: set SO_REUSEPORT<br>
>> > 2024-07-14 15:23:39: pid 18729: LOCATION: wd_heartbeat.c:231<br>
>> > 2024-07-14 15:23:39: pid 18731: LOG: set SO_REUSEPORT option to the<br>
>> socket<br>
>> > 2024-07-14 15:23:39: pid 18731: LOCATION: wd_heartbeat.c:691<br>
>> > 2024-07-14 15:23:39: pid 18731: LOG: creating socket for sending<br>
>> heartbeat<br>
>> > 2024-07-14 15:23:39: pid 18731: DETAIL: set SO_REUSEPORT<br>
>> > 2024-07-14 15:23:39: pid 18731: LOCATION: wd_heartbeat.c:148<br>
>> > 2024-07-14 15:23:39: pid 18706: LOG: find_primary_node: standby node is<br>
>> 1<br>
>> > 2024-07-14 15:23:39: pid 18706: LOCATION: pgpool_main.c:3329<br>
>> > 2024-07-14 15:23:40: pid 18706: LOG: find_primary_node: standby node is<br>
>> 1<br>
>> > 2024-07-14 15:23:40: pid 18706: LOCATION: pgpool_main.c:3329<br>
>> > 2024-07-14 15:23:41: pid 18706: LOG: find_primary_node: standby node is<br>
>> 1<br>
>> > 2024-07-14 15:23:41: pid 18706: LOCATION: pgpool_main.c:3329<br>
>> ><br>
>> > So I have no ideas anymore.<br>
>> ><br>
>> > The cleaned from comments and sorted pgpool.conf looks so:<br>
>> ><br>
>> *********************************************************************************************<br>
>> > allow_clear_text_frontend_auth=off<br>
>> > allow_multiple_failover_requests_from_node=off<br>
>> > allow_sql_comments=off<br>
>> > app_name_redirect_preference_list=''<br>
>> > arping_cmd='/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'<br>
>> > arping_path='/usr/sbin'<br>
>> > authentication_timeout=1min<br>
>> > auto_failback_interval=1min<br>
>> > auto_failback=off<br>
>> > backend_application_name0=mgrdb1<br>
>> > backend_application_name1=mgrdb2<br>
>> > backend_clustering_mode='streaming_replication'<br>
>> > backend_data_directory0='/home/postgres/databases/fabrix2'<br>
>> > backend_data_directory1='/home/postgres/databases/fabrix2'<br>
>> > backend_flag0='ALLOW_TO_FAILOVER'<br>
>> > backend_flag1='ALLOW_TO_FAILOVER'<br>
>> > backend_hostname0='10.65.188.55'<br>
>> > backend_hostname1='10.65.188.56'<br>
>> > backend_port0=5432<br>
>> > backend_port1=5432<br>
>> > backend_weight0=1<br>
>> > backend_weight1=1<br>
>> > black_function_list='currval,lastval,nextval,setval'<br>
>> > cache_safe_memqcache_table_list=''<br>
>> > cache_unsafe_memqcache_table_list=''<br>
>> > check_temp_table=catalog<br>
>> > check_unlogged_table=on<br>
>> > child_life_time=1<br>
>> > child_max_connections=0<br>
>> > clear_memqcache_on_escalation=on<br>
>> > client_idle_limit=0<br>
>> > client_idle_limit_in_recovery=0<br>
>> > client_min_messages = debug1<br>
>> > connection_cache=on<br>
>> > connection_life_time=0<br>
>> > connect_timeout=10000<br>
>> > database_redirect_preference_list=''<br>
>> > delay_threshold=10000000<br>
>> > delegate_IP='10.65.188.59'<br>
>> > detach_false_primary=off<br>
>> > disable_load_balance_on_write='transaction'<br>
>> > dml_adaptive_object_relationship_list=''<br>
>> > enable_consensus_with_half_votes=on<br>
>> > enable_pool_hba=on<br>
>> > enable_shared_relcache=on<br>
>> > failback_command=''<br>
>> > failover_command='/etc/pgpool-II/recovery/failover.sh %d %P %H %R'<br>
>> > failover_if_affected_tuples_mismatch=off<br>
>> > failover_on_backend_error=on<br>
>> > failover_require_consensus=on<br>
>> > failover_when_quorum_exists=off<br>
>> > follow_master_command=''<br>
>> > follow_primary_command=''<br>
>> > health_check_database=''<br>
>> > health_check_max_retries=0<br>
>> > health_check_password='***************'<br>
>> > health_check_period=1<br>
>> > health_check_retry_delay=1<br>
>> > health_check_timeout=0<br>
>> > health_check_user='postgres'<br>
>> > heartbeat_device0='eth0'<br>
>> > heartbeat_device1='eth0'<br>
>> > heartbeat_hostname0='10.65.188.55'<br>
>> > heartbeat_hostname1='10.65.188.56'<br>
>> > heartbeat_port0=9694<br>
>> > heartbeat_port1=9694<br>
>> > hostname0='10.65.188.55'<br>
>> > hostname1='10.65.188.56'<br>
>> > if_cmd_path='/sbin'<br>
>> > if_down_cmd='/usr/bin/sudo /sbin/ip addr del $_IP_$/18 dev eth0'<br>
>> > if_up_cmd='/usr/bin/sudo /sbin/ip addr add $_IP_$/18 dev eth0 label<br>
>> eth0:pg'<br>
>> > ignore_leading_white_space=on<br>
>> > insert_lock=off<br>
>> > listen_addresses='*'<br>
>> > listen_backlog_multiplier=1<br>
>> > load_balance_mode=off<br>
>> > lobj_lock_table=''<br>
>> > log_client_messages=off<br>
>> > log_connections=off<br>
>> > log_destination='stderr'<br>
>> > log_directory='/var/log/pgpool_log'<br>
>> > logdir='/var/log'<br>
>> > log_disconnections=off<br>
>> > log_error_verbosity = verbose<br>
>> > log_filename='pgpool-%a.log'<br>
>> > logging_collector=on<br>
>> > log_hostname=off<br>
>> > log_line_prefix='%t: pid %p: '# printf-style string to output at<br>
>> beginning<br>
>> > of each log line.<br>
>> > log_min_messages = debug1<br>
>> > log_per_node_statement=off<br>
>> > log_rotation_age=1d<br>
>> > log_rotation_size=0<br>
>> > log_standby_delay='if_over_threshold'<br>
>> > log_statement=off<br>
>> > log_truncate_on_rotation=on<br>
>> > master_slave_mode=on<br>
>> > master_slave_sub_mode='stream'<br>
>> > max_pool=1<br>
>> > memory_cache_enabled=off<br>
>> > memqcache_auto_cache_invalidation=on<br>
>> > memqcache_cache_block_size=1MB<br>
>> > memqcache_expire=0<br>
>> > memqcache_maxcache=400kB<br>
>> > memqcache_max_num_cache=1000000<br>
>> > memqcache_memcached_host='localhost'<br>
>> > memqcache_memcached_port=11211<br>
>> > memqcache_method='shmem'<br>
>> > memqcache_oiddir='/var/log/pgpool/oiddir'<br>
>> > memqcache_total_size=64MB<br>
>> > num_init_children=160<br>
>> > pcp_listen_addresses='*'<br>
>> > pcp_port=9898<br>
>> > pcp_socket_dir='/var/run/postgresql'<br>
>> > pgpool_port0=9999<br>
>> > pgpool_port1=9999<br>
>> > pid_file_name='/var/run/pgpool.pid'<br>
>> > ping_path='/bin'<br>
>> > pool_passwd='pool_passwd'<br>
>> > port=9999<br>
>> > primary_routing_query_pattern_list=''<br>
>> > read_only_function_list=''<br>
>> > recovery_1st_stage_command='recovery_1st_stage'<br>
>> > recovery_2nd_stage_command=''<br>
>> > recovery_password='**************'<br>
>> > recovery_timeout=90<br>
>> > recovery_user='postgres'<br>
>> > relcache_expire=0<br>
>> > relcache_query_target=primary<br>
>> > relcache_size=256<br>
>> > replicate_select=off<br>
>> > replication_mode=off<br>
>> > replication_stop_on_mismatch=off<br>
>> > reserved_connections=0<br>
>> > reset_query_list='ABORT; DISCARD ALL'<br>
>> > search_primary_node_timeout=0<br>
>> > serialize_accept=off<br>
>> > socket_dir='/var/run/postgresql'<br>
>> > sr_check_database='postgres'<br>
>> > sr_check_password='fabrix'<br>
>> > sr_check_period=1<br>
>> > sr_check_user='****************'<br>
>> > ssl_ciphers='HIGH:MEDIUM:+3DES:!aNULL'<br>
>> > ssl_dh_params_file=''<br>
>> > ssl_ecdh_curve='prime256v1'<br>
>> > ssl=off<br>
>> > ssl_prefer_server_ciphers=off<br>
>> > statement_level_load_balance=off<br>
>> > syslog_facility='LOCAL0'<br>
>> > syslog_ident='pgpool'<br>
>> > trusted_servers=''<br>
>> > use_watchdog=on<br>
>> > wd_authkey=''<br>
>> > wd_de_escalation_command=''<br>
>> > wd_escalation_command=''<br>
>> > wd_heartbeat_deadtime=30<br>
>> > wd_heartbeat_keepalive=2<br>
>> > wd_interval=10<br>
>> > wd_ipc_socket_dir='/var/run/postgresql'<br>
>> > wd_lifecheck_dbname='template1'<br>
>> > wd_lifecheck_method='heartbeat'<br>
>> > wd_lifecheck_password=''<br>
>> > wd_lifecheck_query='SELECT 1'<br>
>> > wd_lifecheck_user='nobody'<br>
>> > wd_life_point=3<br>
>> > wd_monitoring_interfaces_list=''<br>
>> > wd_port0=9000<br>
>> > wd_port1=9000<br>
>> > wd_priority=1<br>
>> > write_function_list=''<br>
>> ><br>
>> *********************************************************************************************<br>
>> ><br>
>> > pool_hba.conf:<br>
>> ><br>
>> *********************************************************************************************<br>
>> > [root@pg-mgrdb1 pgpool_log]# cat /etc/pgpool-II/pool_hba.conf<br>
>> > # TYPE DATABASE USER CIDR-ADDRESS METHOD<br>
>> > local all all trust<br>
>> > ## IPv4 local connections<br>
>> > host all all <a href="http://127.0.0.1/32" rel="noreferrer noreferrer" target="_blank">127.0.0.1/32</a> md5<br>
>> > host all postgres <a href="http://10.65.188.55/32" rel="noreferrer noreferrer" target="_blank">10.65.188.55/32</a> md5<br>
>> > host all postgres <a href="http://10.65.188.56/32" rel="noreferrer noreferrer" target="_blank">10.65.188.56/32</a> md5<br>
>> > host all postgres <a href="http://10.65.188.59/32" rel="noreferrer noreferrer" target="_blank">10.65.188.59/32</a> md5<br>
>> > host all fabrix <a href="http://10.65.188.55/32" rel="noreferrer noreferrer" target="_blank">10.65.188.55/32</a> md5<br>
>> > host all fabrix <a href="http://10.65.188.56/32" rel="noreferrer noreferrer" target="_blank">10.65.188.56/32</a> md5<br>
>> > host all fabrix <a href="http://10.65.188.59/32" rel="noreferrer noreferrer" target="_blank">10.65.188.59/32</a> md5<br>
>> > ## Replication Hosts<br>
>> > host manager fabrix <a href="http://10.65.188.55/32" rel="noreferrer noreferrer" target="_blank">10.65.188.55/32</a> md5<br>
>> > host manager fabrix <a href="http://10.65.188.56/32" rel="noreferrer noreferrer" target="_blank">10.65.188.56/32</a> md5<br>
>> > host manager fabrix <a href="http://10.65.188.59/32" rel="noreferrer noreferrer" target="_blank">10.65.188.59/32</a> md5<br>
>> > [root@pg-mgrdb1 pgpool_log]#<br>
>> ><br>
>> *********************************************************************************************<br>
>> ><br>
>> > pool_passwd:<br>
>> ><br>
>> *********************************************************************************************<br>
>> > [root@pg-mgrdb1 pgpool_log]# cat /etc/pgpool-II/pool_passwd<br>
>> > postgres:AESdLGx72XCAdQUybSno/SFuw==<br>
>> > fabrix:AES9Zd8L9XYGiZ5RMycfSFBFw==<br>
>> > [root@pg-mgrdb1 pgpool_log]#<br>
>> ><br>
>> *********************************************************************************************<br>
>> ><br>
>> > I'd highly appreciate any ideas or hints for further investigations.<br>
>> ><br>
>> > BR<br>
>> > Igor Yurchenko<br>
>><br>
</blockquote></div></div></div>