[pgpool-hackers: 4493] Re: Pgpool-4.2 doesn't see primary node of Postgresql15 'stream' cluster

Wed Jul 24 04:22:00 JST 2024

Yeah, you are right.
One more question: in pg_enc documentation there mentioned that it takes a
encryption key by default at /root/.pgpoolkey (with permissionts 0600). But
pgpool started by pgpool.service as a user postgres.

TBH, I don't understand how this case handled.

BR
Igor Yurchenko

On Tue, 16 Jul 2024, 3:27 Tatsuo Ishii, <ishii at postgresql.org> wrote:

> > Hi Tatsuo
> >
> > Yeah, it was the reason. Great thanks!!!
> > Removing the /var/log/pgpool_status solves the problem.
>
> Glad to hear that!
>
> > The only remark that as I understand, (at least on pgpool-4.2) a pcp
> > port/socket becomes available only after successful backend
> > initialization. So, in my case without the initialized primary backend
> > all pcp commands didn't work.
>
> More precisely, pcp commands are available after searching the primary
> node is done.  In your case, until the operation done or
> search_primary_node_timeout passed, pcp commands are not available
> because there's no primary node at the moment.  Since the default
> value of search_primary_node_timeout is 5 minutes, you needed to wait
> for 5 minutes before pcp commands are available.  You can make
> search_primary_node_timeout shorter if you like.
>
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
> > BR
> > Igor Yurchenko
> >
> >
> > On Mon, 15 Jul 2024 at 06:51, Tatsuo Ishii <ishii at postgresql.org> wrote:
> >
> >> Hi,
> >>
> >> My guess is, pgpool_status (should be placed under logdir) file is set
> >> to:
> >>
> >> down
> >> up
> >>
> >> i.e. pgpool thinks that the PostgreSQL primary node is detached from
> >> pgpool. If you are sure that the PostgreSQL primary is healthy, you
> >> can attach the primary node by using pcp_attached_node. Alternative
> >> method is, stopping pgpool; removing the pgpool_status file;
> >> restarting pgpool. pgpool will automatically create pgpool_status file
> >> and mark the primary up if it is up and running.
> >>
> >> Best reagards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS LLC
> >> English: http://www.sraoss.co.jp/index_en/
> >> Japanese:http://www.sraoss.co.jp
> >>
> >>
> >> > Hi guys
> >> > I have given up and it looks like I need your help.
> >> >
> >> > A setup:
> >> > I have a setup with 2 hosts of Postgresql15 stream cluster (IPs:
> >> > 10.65.188.55, 10.65.188.56 respectively) with a 2 Pgpool 4.2.15
> instances
> >> > installed in the same hosts.
> >> > The pgpool instances manage/delegate on a VIP 10.65.188.59.
> >> >
> >> > A replication between pg15 instances work well and the instances
> recovery
> >> > status detected reliably:
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix
> /usr/pgsql-15/bin/psql -U
> >> > fabrix -h 10.65.188.55 -p 5432 template1 -c 'select
> pg_is_in_recovery()'
> >> >  pg_is_in_recovery
> >> > -------------------
> >> >  f
> >> > (1 row)
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix
> /usr/pgsql-15/bin/psql -U
> >> > fabrix -h 10.65.188.56 -p 5432 template1 -c 'select
> pg_is_in_recovery()'
> >> >  pg_is_in_recovery
> >> > -------------------
> >> >  t
> >> > (1 row)
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix
> /usr/pgsql-15/bin/psql -U
> >> > fabrix -h 10.65.188.55 -p 5432 template1 -c 'select * from
> >> > pg_stat_replication'
> >> >   pid  | usesysid | usename | application_name | client_addr  |
> >> > client_hostname | client_port |         backend_start         |
> >> > backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn |
> >> replay_lsn |
> >> > write_lag | f
> >> > lush_lag | replay_lag | sync_priority | sync_state |
> reply_time
> >> >
> >>
> -------+----------+---------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+--
> >> >
> >>
> ---------+------------+---------------+------------+-------------------------------
> >> >  13875 |    16384 | fabrix  | walreceiver      | 10.65.188.56 |
> >> >     |       47636 | 2024-07-14 14:30:54.344682+03 |          743 |
> >> > streaming | 0/B0001C0 | 0/B0001C0 | 0/B0001C0 | 0/B0001C0  |
>  |
> >> >          |            |             0 | async      | 2024-07-14
> >> > 14:48:41.550424+03
> >> > (1 row)
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#
> >> >
> >> > The databases of the pg15 cluster are reachable/available via VIP:
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#
> >> > [root at pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix
> /usr/pgsql-15/bin/psql -U
> >> > fabrix -h 10.65.188.59 -p 9999 manager -c '\dt'
> >> >                  List of relations
> >> >  Schema |         Name          | Type  |  Owner
> >> > --------+-----------------------+-------+----------
> >> >  public | upgrade_version_table | table | postgres
> >> > (1 row)
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix
> /usr/pgsql-15/bin/psql -U
> >> > fabrix -h 10.65.188.59 -p 9999 manager -c 'select * from
> >> > upgrade_version_table'
> >> >     app_name    | cur_version | upgrade_to
> >> > ----------------+-------------+------------
> >> >  brODtv Manager | 10.2.0.300  |
> >> > (1 row)
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#
> >> >
> >> > But, the pgpool cannot detect the cluster primary node, and provides
> >> > incorrect pool_nodes info:
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#  PGPASSWORD=fabrix
> /usr/pgsql-15/bin/psql -U
> >> > fabrix -h 10.65.188.59 -p 9999 manager -c 'show pool_nodes'
> >> >  node_id |   hostname   | port | status | lb_weight |  role   |
> >> select_cnt
> >> > | load_balance_node | replication_delay | replication_state |
> >> > replication_sync_state | last_status_change
> >> >
> >>
> ---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
> >> >  0       | 10.65.188.55 | 5432 | down   | 0.500000  | standby | 0
> >> >  | false             | 0                 |                   |
> >> >            | 2024-07-14 15:04:01
> >> >  1       | 10.65.188.56 | 5432 | up     | 0.500000  | standby | 3
> >> >  | true              | 0                 |                   |
> >> >            | 2024-07-14 15:04:01
> >> > (2 rows)
> >> >
> >> > [root at pg-mgrdb1 pgpool-II]#
> >> >
> >> > Right now, I'm testing it with a "single pgpool up" configuration
> (pgpool
> >> > on the second host is down). The  pgpool log shows that the pgpool
> >> doesn't
> >> > even try to touch the first (current primary) backend at
> >> 10.65.188.55:5432.
> >> > It successfully connects to the second 'standby' node and detects its
> >> > status. And this second node handles successfully all SELECT queries
> due
> >> to
> >> > load balancing configured. But what's wrong with first primary node?
> >> >
> >> > In the pgpool log I don't see any tries to connect to the first node.
> The
> >> > piece of the log where "find_primary_node" mentioned first looks so:
> >> >
> >> > 2024-07-14 15:23:38: pid 18723: LOG:  2 watchdog nodes are configured
> for
> >> > lifecheck
> >> > 2024-07-14 15:23:38: pid 18723: LOCATION:  wd_lifecheck.c:495
> >> > 2024-07-14 15:23:38: pid 18723: LOG:  watchdog nodes ID:0 Name:"
> >> > 10.65.188.55:9999 Linux pg-mgrdb1"
> >> > 2024-07-14 15:23:38: pid 18723: DETAIL:  Host:"10.65.188.55" WD
> Port:9000
> >> > pgpool-II port:9999
> >> > 2024-07-14 15:23:38: pid 18723: LOCATION:  wd_lifecheck.c:503
> >> > 2024-07-14 15:23:38: pid 18723: LOG:  watchdog nodes ID:1
> Name:"Not_Set"
> >> > 2024-07-14 15:23:38: pid 18723: DETAIL:  Host:"10.65.188.56" WD
> Port:9000
> >> > pgpool-II port:9999
> >> > 2024-07-14 15:23:38: pid 18723: LOCATION:  wd_lifecheck.c:503
> >> > 2024-07-14 15:23:38: pid 18706: LOG:  find_primary_node_repeatedly:
> >> waiting
> >> > for finding a primary node
> >> > 2024-07-14 15:23:38: pid 18706: LOCATION:  pgpool_main.c:3404
> >> > 2024-07-14 15:23:38: pid 18706: LOG:  find_primary_node: standby node
> is
> >> 1
> >> > 2024-07-14 15:23:38: pid 18706: LOCATION:  pgpool_main.c:3329
> >> > 2024-07-14 15:23:39: pid 18729: LOG:  set SO_REUSEPORT option to the
> >> socket
> >> > 2024-07-14 15:23:39: pid 18729: LOCATION:  wd_heartbeat.c:691
> >> > 2024-07-14 15:23:39: pid 18729: LOG:  creating watchdog heartbeat
> receive
> >> > socket.
> >> > 2024-07-14 15:23:39: pid 18729: DETAIL:  set SO_REUSEPORT
> >> > 2024-07-14 15:23:39: pid 18729: LOCATION:  wd_heartbeat.c:231
> >> > 2024-07-14 15:23:39: pid 18731: LOG:  set SO_REUSEPORT option to the
> >> socket
> >> > 2024-07-14 15:23:39: pid 18731: LOCATION:  wd_heartbeat.c:691
> >> > 2024-07-14 15:23:39: pid 18731: LOG:  creating socket for sending
> >> heartbeat
> >> > 2024-07-14 15:23:39: pid 18731: DETAIL:  set SO_REUSEPORT
> >> > 2024-07-14 15:23:39: pid 18731: LOCATION:  wd_heartbeat.c:148
> >> > 2024-07-14 15:23:39: pid 18706: LOG:  find_primary_node: standby node
> is
> >> 1
> >> > 2024-07-14 15:23:39: pid 18706: LOCATION:  pgpool_main.c:3329
> >> > 2024-07-14 15:23:40: pid 18706: LOG:  find_primary_node: standby node
> is
> >> 1
> >> > 2024-07-14 15:23:40: pid 18706: LOCATION:  pgpool_main.c:3329
> >> > 2024-07-14 15:23:41: pid 18706: LOG:  find_primary_node: standby node
> is
> >> 1
> >> > 2024-07-14 15:23:41: pid 18706: LOCATION:  pgpool_main.c:3329
> >> >
> >> > So I have no ideas anymore.
> >> >
> >> > The cleaned from comments and  sorted pgpool.conf looks so:
> >> >
> >>
> *********************************************************************************************
> >> > allow_clear_text_frontend_auth=off
> >> > allow_multiple_failover_requests_from_node=off
> >> > allow_sql_comments=off
> >> > app_name_redirect_preference_list=''
> >> > arping_cmd='/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
> >> > arping_path='/usr/sbin'
> >> > authentication_timeout=1min
> >> > auto_failback_interval=1min
> >> > auto_failback=off
> >> > backend_application_name0=mgrdb1
> >> > backend_application_name1=mgrdb2
> >> > backend_clustering_mode='streaming_replication'
> >> > backend_data_directory0='/home/postgres/databases/fabrix2'
> >> > backend_data_directory1='/home/postgres/databases/fabrix2'
> >> > backend_flag0='ALLOW_TO_FAILOVER'
> >> > backend_flag1='ALLOW_TO_FAILOVER'
> >> > backend_hostname0='10.65.188.55'
> >> > backend_hostname1='10.65.188.56'
> >> > backend_port0=5432
> >> > backend_port1=5432
> >> > backend_weight0=1
> >> > backend_weight1=1
> >> > black_function_list='currval,lastval,nextval,setval'
> >> > cache_safe_memqcache_table_list=''
> >> > cache_unsafe_memqcache_table_list=''
> >> > check_temp_table=catalog
> >> > check_unlogged_table=on
> >> > child_life_time=1
> >> > child_max_connections=0
> >> > clear_memqcache_on_escalation=on
> >> > client_idle_limit=0
> >> > client_idle_limit_in_recovery=0
> >> > client_min_messages = debug1
> >> > connection_cache=on
> >> > connection_life_time=0
> >> > connect_timeout=10000
> >> > database_redirect_preference_list=''
> >> > delay_threshold=10000000
> >> > delegate_IP='10.65.188.59'
> >> > detach_false_primary=off
> >> > disable_load_balance_on_write='transaction'
> >> > dml_adaptive_object_relationship_list=''
> >> > enable_consensus_with_half_votes=on
> >> > enable_pool_hba=on
> >> > enable_shared_relcache=on
> >> > failback_command=''
> >> > failover_command='/etc/pgpool-II/recovery/failover.sh %d %P %H %R'
> >> > failover_if_affected_tuples_mismatch=off
> >> > failover_on_backend_error=on
> >> > failover_require_consensus=on
> >> > failover_when_quorum_exists=off
> >> > follow_master_command=''
> >> > follow_primary_command=''
> >> > health_check_database=''
> >> > health_check_max_retries=0
> >> > health_check_password='***************'
> >> > health_check_period=1
> >> > health_check_retry_delay=1
> >> > health_check_timeout=0
> >> > health_check_user='postgres'
> >> > heartbeat_device0='eth0'
> >> > heartbeat_device1='eth0'
> >> > heartbeat_hostname0='10.65.188.55'
> >> > heartbeat_hostname1='10.65.188.56'
> >> > heartbeat_port0=9694
> >> > heartbeat_port1=9694
> >> > hostname0='10.65.188.55'
> >> > hostname1='10.65.188.56'
> >> > if_cmd_path='/sbin'
> >> > if_down_cmd='/usr/bin/sudo /sbin/ip addr del $_IP_$/18 dev eth0'
> >> > if_up_cmd='/usr/bin/sudo /sbin/ip addr add $_IP_$/18 dev eth0 label
> >> eth0:pg'
> >> > ignore_leading_white_space=on
> >> > insert_lock=off
> >> > listen_addresses='*'
> >> > listen_backlog_multiplier=1
> >> > load_balance_mode=off
> >> > lobj_lock_table=''
> >> > log_client_messages=off
> >> > log_connections=off
> >> > log_destination='stderr'
> >> > log_directory='/var/log/pgpool_log'
> >> > logdir='/var/log'
> >> > log_disconnections=off
> >> > log_error_verbosity = verbose
> >> > log_filename='pgpool-%a.log'
> >> > logging_collector=on
> >> > log_hostname=off
> >> > log_line_prefix='%t: pid %p: '# printf-style string to output at
> >> beginning
> >> > of each log line.
> >> > log_min_messages = debug1
> >> > log_per_node_statement=off
> >> > log_rotation_age=1d
> >> > log_rotation_size=0
> >> > log_standby_delay='if_over_threshold'
> >> > log_statement=off
> >> > log_truncate_on_rotation=on
> >> > master_slave_mode=on
> >> > master_slave_sub_mode='stream'
> >> > max_pool=1
> >> > memory_cache_enabled=off
> >> > memqcache_auto_cache_invalidation=on
> >> > memqcache_cache_block_size=1MB
> >> > memqcache_expire=0
> >> > memqcache_maxcache=400kB
> >> > memqcache_max_num_cache=1000000
> >> > memqcache_memcached_host='localhost'
> >> > memqcache_memcached_port=11211
> >> > memqcache_method='shmem'
> >> > memqcache_oiddir='/var/log/pgpool/oiddir'
> >> > memqcache_total_size=64MB
> >> > num_init_children=160
> >> > pcp_listen_addresses='*'
> >> > pcp_port=9898
> >> > pcp_socket_dir='/var/run/postgresql'
> >> > pgpool_port0=9999
> >> > pgpool_port1=9999
> >> > pid_file_name='/var/run/pgpool.pid'
> >> > ping_path='/bin'
> >> > pool_passwd='pool_passwd'
> >> > port=9999
> >> > primary_routing_query_pattern_list=''
> >> > read_only_function_list=''
> >> > recovery_1st_stage_command='recovery_1st_stage'
> >> > recovery_2nd_stage_command=''
> >> > recovery_password='**************'
> >> > recovery_timeout=90
> >> > recovery_user='postgres'
> >> > relcache_expire=0
> >> > relcache_query_target=primary
> >> > relcache_size=256
> >> > replicate_select=off
> >> > replication_mode=off
> >> > replication_stop_on_mismatch=off
> >> > reserved_connections=0
> >> > reset_query_list='ABORT; DISCARD ALL'
> >> > search_primary_node_timeout=0
> >> > serialize_accept=off
> >> > socket_dir='/var/run/postgresql'
> >> > sr_check_database='postgres'
> >> > sr_check_password='fabrix'
> >> > sr_check_period=1
> >> > sr_check_user='****************'
> >> > ssl_ciphers='HIGH:MEDIUM:+3DES:!aNULL'
> >> > ssl_dh_params_file=''
> >> > ssl_ecdh_curve='prime256v1'
> >> > ssl=off
> >> > ssl_prefer_server_ciphers=off
> >> > statement_level_load_balance=off
> >> > syslog_facility='LOCAL0'
> >> > syslog_ident='pgpool'
> >> > trusted_servers=''
> >> > use_watchdog=on
> >> > wd_authkey=''
> >> > wd_de_escalation_command=''
> >> > wd_escalation_command=''
> >> > wd_heartbeat_deadtime=30
> >> > wd_heartbeat_keepalive=2
> >> > wd_interval=10
> >> > wd_ipc_socket_dir='/var/run/postgresql'
> >> > wd_lifecheck_dbname='template1'
> >> > wd_lifecheck_method='heartbeat'
> >> > wd_lifecheck_password=''
> >> > wd_lifecheck_query='SELECT 1'
> >> > wd_lifecheck_user='nobody'
> >> > wd_life_point=3
> >> > wd_monitoring_interfaces_list=''
> >> > wd_port0=9000
> >> > wd_port1=9000
> >> > wd_priority=1
> >> > write_function_list=''
> >> >
> >>
> *********************************************************************************************
> >> >
> >> > pool_hba.conf:
> >> >
> >>
> *********************************************************************************************
> >> > [root at pg-mgrdb1 pgpool_log]# cat /etc/pgpool-II/pool_hba.conf
> >> > # TYPE  DATABASE        USER    CIDR-ADDRESS    METHOD
> >> > local   all     all     trust
> >> > ## IPv4 local connections
> >> > host    all     all     127.0.0.1/32    md5
> >> > host    all     postgres        10.65.188.55/32 md5
> >> > host    all     postgres        10.65.188.56/32 md5
> >> > host    all     postgres        10.65.188.59/32 md5
> >> > host    all     fabrix  10.65.188.55/32 md5
> >> > host    all     fabrix  10.65.188.56/32 md5
> >> > host    all     fabrix  10.65.188.59/32 md5
> >> > ## Replication Hosts
> >> > host    manager fabrix  10.65.188.55/32 md5
> >> > host    manager fabrix  10.65.188.56/32 md5
> >> > host    manager fabrix  10.65.188.59/32 md5
> >> > [root at pg-mgrdb1 pgpool_log]#
> >> >
> >>
> *********************************************************************************************
> >> >
> >> > pool_passwd:
> >> >
> >>
> *********************************************************************************************
> >> > [root at pg-mgrdb1 pgpool_log]# cat /etc/pgpool-II/pool_passwd
> >> > postgres:AESdLGx72XCAdQUybSno/SFuw==
> >> > fabrix:AES9Zd8L9XYGiZ5RMycfSFBFw==
> >> > [root at pg-mgrdb1 pgpool_log]#
> >> >
> >>
> *********************************************************************************************
> >> >
> >> > I'd highly appreciate any ideas or hints for further investigations.
> >> >
> >> > BR
> >> > Igor Yurchenko
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-hackers/attachments/20240723/8f7b2946/attachment-0001.htm>