[pgpool-general: 58] Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

Thu Dec 8 17:15:29 JST 2011

On Thu, Dec 8, 2011 at 12:06 AM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>> On Wed, Dec 7, 2011 at 11:10 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>> How is your PostgreSQL server configuration?
>>
>> Did you want a copy of my postgresql.conf ?

attached

>
> Yes, please. Also please show me the output of "show pool_status;".


show pool_status ;
                 item                 |             value
|                           description
--------------------------------------+--------------------------------+------------------------------------------------------------------
 listen_addresses                     | *
| host name(s) or IP address(es) to listen to
 port                                 | 9999
| pgpool accepting port number
 socket_dir                           | /tmp
| pgpool socket directory
 num_init_children                    | 195
| # of children initially pre-forked
 child_life_time                      | 300
| if idle for this seconds, child exits
 connection_life_time                 | 3
| if idle for this seconds, connection closes
 client_idle_limit                    | 0
| if idle for this seconds, child connection closes
 child_max_connections                | 0
| if max_connections received, chile exits
 max_pool                             | 1
| max # of connection pool per child
 authentication_timeout               | 60
| maximum time in seconds to complete client authentication
 logdir                               | /tmp
| PgPool status file logging directory
 log_destination                      | stderr
| logging destination
 syslog_facility                      | LOCAL0
| syslog local faclity
 syslog_ident                         | pgpool
| syslog program ident string
 pid_file_name                        | /var/run/pgpool/pgpool.pid
| path to pid file
 replication_mode                     | 0
| non 0 if operating in replication mode
 load_balance_mode                    | 1
| non 0 if operating in load balancing mode
 replication_stop_on_mismatch         | 0
| stop replication mode on fatal error
 failover_if_affected_tuples_mismatch | 0
| failover if affected tuples are mismatch
 replicate_select                     | 0
| non 0 if SELECT statement is replicated
 reset_query_list                     | ABORT; DISCARD ALL
| queries issued at the end of session
 white_function_list                  |
| functions those do not write to database
 black_function_list                  | currval,lastval,nextval,setval
| functions those write to database
 print_timestamp                      | 1
| if true print time stamp to each log line
 master_slave_mode                    | 1
| if true, operate in master/slave mode
 master_slave_sub_mode                | stream
| master/slave sub mode
 sr_check_period                      | 0
| sr check period
 sr_check_user                        | postgres
| sr check user
 delay_threshold                      | 0
| standby delay threshold
 log_standby_delay                    | none
| how to log standby delay
 connection_cache                     | 1
| if true, cache connection pool
 health_check_timeout                 | 90
| health check timeout
 health_check_period                  | 0
| health check period
 health_check_user                    | postgres
| health check user
 failover_command                     |
| failover command
 follow_master_command                |
| follow master command
 failback_command                     |
| failback command
 fail_over_on_backend_error           | 0
| fail over on backend error
 insert_lock                          | 0
| insert lock
 ignore_leading_white_space           | 1
| ignore leading white spaces
 num_reset_queries                    | 2
| number of queries in reset_query_list
 pcp_port                             | 9898
| PCP port # to bind
 pcp_socket_dir                       | /tmp
| PCP socket directory
 pcp_timeout                          | 10
| PCP timeout for an idle client
 log_statement                        | 0
| if non 0, logs all SQL statements
 log_per_node_statement               | 0
| if non 0, logs all SQL statements on each node
 log_connections                      | 0
| if true, print incoming connections to the log
 log_hostname                         | 0
| if true, resolve hostname for ps and log print
 enable_pool_hba                      | 1
| if true, use pool_hba.conf for client authentication
 recovery_user                        | postgres
| online recovery user
 recovery_1st_stage_command           |
| execute a command in first stage.
 recovery_2nd_stage_command           |
| execute a command in second stage.
 recovery_timeout                     | 90
| max time in seconds to wait for the recovering node's postmaster
 client_idle_limit_in_recovery        | 0
| if idle for this seconds, child connection closes in recovery 2n
 lobj_lock_table                      |
| table name used for large object replication control
 ssl                                  | 0
| SSL support
 ssl_key                              |
| path to the SSL private key file
 ssl_cert                             |
| path to the SSL public certificate file
 debug_level                          | 0
| debug message level
 relcache_expire                      | 0
| relation cache expiration time in seconds
 parallel_mode                        | 0
| if non 0, run in parallel query mode
 enable_query_cache                   | 0
| if non 0, use query cache
 pgpool2_hostname                     | cuda-fs2
| pgpool2 hostname
 system_db_hostname                   | localhost
| system DB hostname
 system_db_port                       | 5432
| system DB port number
 system_db_dbname                     | pgpool
| system DB name
 system_db_schema                     | pgpool_catalog
| system DB schema name
 system_db_user                       | pgpool
| user name to access system DB
 backend_hostname0                    | cuda-db2
| backend #0 hostname
 backend_port0                        | 5432
| backend #0 port number
 backend_weight0                      | 0.090909
| weight of backend #0
 backend_status0                      | 2
| status of backend #0
 standby_delay0                       | 0
| standby delay of backend #0
 backend_flag0                        | ALLOW_TO_FAILOVER
| backend #0 flag
 backend_hostname1                    | cuda-db1
| backend #1 hostname
 backend_port1                        | 5432
| backend #1 port number
 backend_weight1                      | 0.454545
| weight of backend #1
 backend_status1                      | 2
| status of backend #1
 standby_delay1                       | 0
| standby delay of backend #1
 backend_flag1                        | ALLOW_TO_FAILOVER
| backend #1 flag
 backend_hostname2                    | cuda-db0
| backend #2 hostname
 backend_port2                        | 5432
| backend #2 port number
 backend_weight2                      | 0.454545
| weight of backend #2
 backend_status2                      | 2
| status of backend #2
 standby_delay2                       | 0
| standby delay of backend #2
 backend_flag2                        | ALLOW_TO_FAILOVER
| backend #2 flag
(86 rows)

>
>>> Before you said you have 1 master and 2 standbys. Is this still
>>> correct? It seems you only have two servers from the gdb trace and I
>>> would like to make sure that.
>>
>> Yes, we still have 3 servers.  pcp_node_count reports 3.
>
> Ok.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>>>> Nope, no SSL enabled anywhere.
>>>>
>>>> On Tue, Dec 6, 2011 at 6:49 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
>>>>> Can you tell me if you are enabling SSL between frontend and pgpool
>>>>> AND/OR pgpool and PostgreSQL?
>>>>> --
>>>>> Tatsuo Ishii
>>>>> SRA OSS, Inc. Japan
>>>>> English: http://www.sraoss.co.jp/index_en.php
>>>>> Japanese: http://www.sraoss.co.jp
>>>>>
>>>>>> Lonni,
>>>>>>
>>>>>> First of all, pgpool-general at pgfoundry has moved to  pgpool-general at pgpool.net.
>>>>>> Please subscribe here:
>>>>>>
>>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>>>>
>>>>>> From: Lonni J Friedman <netllama at gmail.com>
>>>>>> Subject: Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU
>>>>>> Date: Tue, 6 Dec 2011 16:23:41 -0800
>>>>>> Message-ID: <CAP=oouHQACD6ELcHOZz+3Oz8NkbbgjK3gSRbcbHrPKoi_DRP8g at mail.gmail.com>
>>>>>>
>>>>>>> On Wed, Nov 23, 2011 at 10:51 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>>>>>> On Wed, Nov 23, 2011 at 10:42 PM, Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>>>>>>>>> Not wanting to be impatient, but I'm very concerned about this
>>>>>>>>>>> problem, since its impossible to predict when it will occur.  Is there
>>>>>>>>>>> additional information that I can provide to investigate this further?
>>>>>>>>>>
>>>>>>>>>> I really need to know where pgpool is looping.
>>>>>>>>>
>>>>>>>>> OK, how can I capture that information?
>>>>>>>>
>>>>>>>> You already attached to the pgpool process. So just type "n" (for
>>>>>>>> "next") will tell you next line to execute. If pgpool really loops,
>>>>>>>> "n" should show the same line after some repeating "n".
>>>>>>>
>>>>>>> OK, this reproduced again.  Here's the output:
>>>>>>> #######
>>>>>>> (gdb) bt
>>>>>>> #0  0x0000000000419305 in pool_process_query (frontend=0x413e1f0,
>>>>>>> backend=0x24d3520, reset_request=<value optimized out>) at
>>>>>>> pool_process_query.c:111
>>>>>>> #1  0x000000000040ae42 in do_child (unix_fd=3, inet_fd=<value
>>>>>>> optimized out>) at child.c:354
>>>>>>> #2  0x00000000004054c5 in fork_a_child (unix_fd=3, inet_fd=4, id=126)
>>>>>>> at main.c:1072
>>>>>>> #3  0x0000000000407b1c in main (argc=<value optimized out>,
>>>>>>> argv=<value optimized out>) at main.c:549
>>>>>>> (gdb) cont
>>>>>>> Continuing.
>>>>>>> ^C
>>>>>>> Program received signal SIGINT, Interrupt.
>>>>>>> pool_ssl_pending (cp=0x413e1f0) at pool_ssl.c:247
>>>>>>> 247     {
>>>>>>> (gdb) n
>>>>>>> 248             if (cp->ssl_active > 0 && SSL_pending(cp->ssl) > 0)
>>>>>>> (gdb) n
>>>>>>> 251     }
>>>>>>> (gdb) n
>>>>>>> is_cache_empty (frontend=0x413e1f0, backend=0x24d3520) at
>>>>>>> pool_process_query.c:3232
>>>>>>> 3232            if (!pool_read_buffer_is_empty(frontend))
>>>>>>> (gdb) n
>>>>>>> 3235            for (i=0;i<NUM_BACKENDS;i++)
>>>>>>> (gdb) n
>>>>>>> 3237                    if (!VALID_BACKEND(i))
>>>>>>> (gdb) n
>>>>>>> 3244                    if (pool_ssl_pending(CONNECTION(backend, i)))
>>>>>>> (gdb) n
>>>>>>> 3247                    if (CONNECTION(backend, i)->len > 0)
>>>>>>> (gdb) n
>>>>>>> 3237                    if (!VALID_BACKEND(i))
>>>>>>> (gdb) n
>>>>>>> 3244                    if (pool_ssl_pending(CONNECTION(backend, i)))
>>>>>>> (gdb) n
>>>>>>> 3247                    if (CONNECTION(backend, i)->len > 0)
>>>>>>> (gdb) n
>>>>>>> 3252    }
>>>>>>> (gdb) n
>>>>>>> 3235            for (i=0;i<NUM_BACKENDS;i++)
>>>>>>> (gdb) n
>>>>>>> 3252    }
>>>>>>> (gdb) n
>>>>>>> pool_process_query (frontend=0x413e1f0, backend=0x24d3520,
>>>>>>> reset_request=<value optimized out>) at pool_process_query.c:361
>>>>>>> 361                             if
>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 388                     if (got_sighup)
>>>>>>> (gdb) n
>>>>>>> 111             state = 0;
>>>>>>> (gdb) n
>>>>>>> 116                     if (state == 0 && reset_request)
>>>>>>> (gdb) n
>>>>>>> 159                     check_stop_request();
>>>>>>> (gdb) n
>>>>>>> 165                     if (*InRecovery > 0 &&
>>>>>>> pool_config->client_idle_limit_in_recovery == -1)
>>>>>>> (gdb) n
>>>>>>> 179                     if (is_cache_empty(frontend, backend) &&
>>>>>>> !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 361                             if
>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 388                     if (got_sighup)
>>>>>>> (gdb) n
>>>>>>> 111             state = 0;
>>>>>>> (gdb) n
>>>>>>> 116                     if (state == 0 && reset_request)
>>>>>>> (gdb) n
>>>>>>> 159                     check_stop_request();
>>>>>>> (gdb) n
>>>>>>> 165                     if (*InRecovery > 0 &&
>>>>>>> pool_config->client_idle_limit_in_recovery == -1)
>>>>>>> (gdb) n
>>>>>>> 179                     if (is_cache_empty(frontend, backend) &&
>>>>>>> !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 361                             if
>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 388                     if (got_sighup)
>>>>>>> (gdb) n
>>>>>>> 111             state = 0;
>>>>>>> (gdb) n
>>>>>>> 116                     if (state == 0 && reset_request)
>>>>>>> (gdb) n
>>>>>>> 159                     check_stop_request();
>>>>>>> (gdb) n
>>>>>>> 165                     if (*InRecovery > 0 &&
>>>>>>> pool_config->client_idle_limit_in_recovery == -1)
>>>>>>> (gdb) n
>>>>>>> 179                     if (is_cache_empty(frontend, backend) &&
>>>>>>> !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 361                             if
>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 388                     if (got_sighup)
>>>>>>> (gdb) n
>>>>>>> 111             state = 0;
>>>>>>> (gdb) n
>>>>>>> 116                     if (state == 0 && reset_request)
>>>>>>> (gdb) n
>>>>>>> 159                     check_stop_request();
>>>>>>> (gdb) n
>>>>>>> 165                     if (*InRecovery > 0 &&
>>>>>>> pool_config->client_idle_limit_in_recovery == -1)
>>>>>>> (gdb) n
>>>>>>> 179                     if (is_cache_empty(frontend, backend) &&
>>>>>>> !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 361                             if
>>>>>>> (!pool_read_buffer_is_empty(frontend) && !pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) n
>>>>>>> 379                             if
>>>>>>> (!pool_read_buffer_is_empty(MASTER(backend)) ||
>>>>>>> pool_is_query_in_progress())
>>>>>>> (gdb) bt
>>>>>>> #0  pool_process_query (frontend=0x413e1f0, backend=0x24d3520,
>>>>>>> reset_request=<value optimized out>) at pool_process_query.c:379
>>>>>>> #1  0x000000000040ae42 in do_child (unix_fd=3, inet_fd=<value
>>>>>>> optimized out>) at child.c:354
>>>>>>> #2  0x00000000004054c5 in fork_a_child (unix_fd=3, inet_fd=4, id=126)
>>>>>>> at main.c:1072
>>>>>>> #3  0x0000000000407b1c in main (argc=<value optimized out>,
>>>>>>> argv=<value optimized out>) at main.c:549
>>>>>>> (gdb) q
>>>>>>> A debugging session is active.
>>>>>>>
>>>>>>>         Inferior 1 [process 22143] will be detached.
>>>>>>> #######
>>>>>>>
>>>>>>>
>>>>>>> Does this clarify where the problem exists?  If so, is it fixed in 3.1.1?
>>>>>>>
>>>>>>> thanks
>>>>>>
>>>>>> Thanks. I will look into this. I'm sure this is not fixed in 3.1.1
>>>>>> since the issue above has not been addressed yet.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgresql.conf
Type: application/octet-stream
Size: 18418 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20111208/7c234f45/attachment.obj>