From pgpool Wiki
Jump to: navigation, search


Pgpool-II Frequently Asked Questions

Why configure fails by "pg_config not found" on my Ubuntu box?

pg_config is in libpq-dev package. You need to install it before running configure.

Why records inserted on the primary node do not appear on the standby nodes?

Are you using streaming replication and a hash index on the table? Then it's a known limitation of streaming replication. The inserted record is there. But if you SELECT the record using the hash index, it will not appear. Hash index changes do not produce WAL record thus they are not reflected to the standby nodes. Solutions are: 1) use btree index instead 2) use pgpool-II native replication.

Can I mix different versions of PostgreSQL as pgpool-II backends?

You cannot mix different major versions of PostgreSQL, for example 8.4.x and 9.0.x. On the other hand you can mix different minor versions of PostgreSQL, for example 9.0.3 and 9.0.4. Pgpool-II assumes messages from PostgreSQL to pgpool-II are identical anytime. Different major version of PostgreSQL may send out different messages and this would cause trouble for Pgpool-II.

Can I mix different platforms of PostgreSQL as pgpool-II backends, for example Linux and Windows?

In streaming replication mode, no. Because streaming replication requires that primary and standby platforms are phsyically identical. On the other hand, pgpool-II's replication mode only requires logically database clusters identical. Beware, however, that online recovery script does not use rsync or some such, which do phical copying among database clusters. You want to use pg_dumpall instead.

It seems my pgpool-II does not do load balancing. Why?

First of all, pgpool-II' load balancing is "session base", not "statement base". That means, DB node selection for load balancing is decided at the beginning of session. So all SQL statements are sent to the same DB node until the session ends.
Another point is, whether statement is in an explicit transaction or not. If the statement is in a transaction, it will not be load balanced in the replication mode. In pgpool-II 3.0 or later, SELECT will be load balanced even in a transaction if operated in the master/slave mode.
Note the method to choose DB node is not LRU or some such. Pgpool-II chooses DB node randomly considering the "weight" parameter in pgpool.conf. This means that the chosen DB node is not uniformly distributed among DB nodes in short term. You might want to inspect the effect of load balancing after ~100 queries have been sent.
Also cursor statements are not load balanced in replication mode. i.e.:DECLARE..FETCH are sent to all DB nodes in replication mode. This is because the SELECT might come with FOR UPDATE/FOR SHARE. Note that some applications including psql could use CURSOR for SELECT. For example, from PostgreSQL 8.2, if "\set FETCH_COUNT n" is executed, psql unconditionaly uses a curor named "_psql_cursor".

How can I observe the effect of load balancing?

We recommend to enable "log_per_node_statement" directive in pgpool.conf for this. Here is an example of the log:
2011-05-07 08:42:42 LOG:   pid 22382: DB node id: 1 backend pid: 22409 statement: SELECT abalance FROM pgbench_accounts WHERE aid = 62797;
The "DB node id: 1" shows which DB node was chosen for this loadbalancing session.
Please make sure that you start pgpool-II with "-n" option to get pgpool-II log. (or you can use syslog in pgpool-II 3.1 or later)

Why am I getting "ProcessFrontendResponse: failed to read kind from frontend. frontend abnormally exited" in my pgool log?

Well, your clients might be ill-behaved:-) PostgreSQL's protocol requires clients to send particular packet before they disconnect the connection. pgpool-II complains that clients disconnect without sending the packet. You could reprodcude the problem by using psql. Connect to pgpool using psql. Kill -9 psql. You will silimar message in the log. The message will not appear if you quit psql normaly. Another possibility is unstable network connection between your client machine and pgpool-II. Check the cable and network interface card.

I'm running pgpool-II in streaming replication mode. It seems it works but I find following errors in the log. Why?

2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "E" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "[" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: pool_read2: EOF encountered with backend
2011-07-19 08:21:59 ERROR: pid 10727: make_persistent_db_connection: s_do_auth failed
2011-07-19 08:21:59 ERROR: pid 10727: find_primary_node: make_persistent_connection failed
pgpool-II tries to connect to PostgreSQL to execute some functions such as pg_current_xlog_location(), which is used for detecting primary server or checking replication delay. The messages above indicates that pgpool-II failed to connect with user = health_check_user and password = health_check_password. You need to set them properly even if health_check_period = 0.
Note that pgpool-II 3.1 or later will use sr_check_user and sr_check_password for it instead.

When I run pgbench to test pgpool-II, pgbench hangs. If I directly run pgbench against PostgreSQL, it works fine. Why?

pgbench creates concurrent connections (the number of connections is specified by "-c" option) before starting actual transactions. So if the number of concurrent transactions specified by "-c" exceeds num_init_children, pgbench will stuck because it will wait for pgpool accepting connections forever (remember that pgpool-II accepts up to num_init_children concurrent sessions. If the number of concurrent sessions reach num_init_children, new session will be queued). On the other hand PostgreSQL does not accept concurrent sessions more than max_connections. So in this case you will just see PostgreSQL errors, rather than connection blocking. If you want to test pgpool-II's connection queuing, you can use psql instead of pgbench. In the example session below, num_init_children = 1 (this is not a recommended setting in the real world. This is just for simplicity).
$ psql test <-- connect to pgpool from terminal #1
psql (9.1.1)
Type "help" for help.
$ psql test <-- tries to connect to pgpool from terminal #2 but it is blocked.
test=# SELECT 1; <--- do something from terminal #1 psql
test=# \q <-- quit psql session on terminal #1
psql (9.1.1) <-- now psql on terminal #2 accepts session
Type "help" for help.

I created pool_hba.conf and pool_passwd to enable md5 authentication through pgpool-II but it does not work. Why?

Probably you made mistake somewhere. For your help here is a table which describes error patterns depending on the setting of pg_hba.conf, pool_hba.conf and pool_passwd.
pg_hba.conf pool_hba.conf pool_passwd result
md5 md5 yes md5 auth
md5 md5 no "MD5" authentication with pgpool failed for user "XX"
md5 trust yes/no MD5 authentication is unsupported in replication, master-slave and parallel mode
trust md5 yes no auth
trust md5 no "MD5" authentication with pgpool failed for user "XX"
trust trust yes/no no auth

How can I set up SSL for pgpool-II?

SSL support for pgpool-II consists of two parts: 1)between client and pgpool-II 2)pgpool-II and PostgreSQL. #1 and #2 are independent each other. For example, you can only enable SSL connection of #1, or #2. Or you can enable both #1 and #2. I explain #1 (for #2, please take a look at PostgreSQL documentation).
Make sure that pgpool is built with openssl. If you build from source code, use --with-openssl option.
First create server certificate. In the command below you will be asked PEM pass phrase(It will be asked when pgpool starts up). If you want to start pgpool without being asked pass phrase, you can remove it later. (sample server certficate create session)
openssl req -new -text -out server.req

Remove PEM pass phrase if you want.

$ openssl rsa -in privkey.pem -out server.key
Enter pass phrase for privkey.pem:
writing RSA key
$ rm privkey.pem

Turn the certificate into a self-signed certificate.

$ openssl req -x509 -in server.req -text -key server.key -out server.crt

Copy server.key and server.crt to appropreate place. Suppose we copy to /usr/local/etc. Make sure that you use cp -p to retain appropreate permission of server.key. Alternatively you can set permission later.

$ chmod og-rwx /usr/local/etc/server.key

Set the certificate and key location in pgpool.conf.

ssl = on
ssl_key = '/usr/local/etc/server.key'
ssl_cert = '/usr/local/etc/server.crt'

Restart pgpool. To confirm SSL connection between client and pgpool is working, connect to pgpool using psql.

psql -h localhost -p 9999 test
psql (9.1.1)
SSL connection (cipher: AES256-SHA, bits: 256)
Type "help" for help.

test=# \q

If you see "SSL connection...", SSL connection between client and pgpool is working. Please make sure that use "-h localhost" option. Because SSL only works with TCP/IP, with Unix domain socket SSL does not work.

I'm using pgpool-II in replication mode. I expected that pgpool-II replaces current_timestamp call with time constants in my INSERT query, but actually it doesn't. Why?

Probably your INSERT query uses schema qualied table name (like public.mytable) and you did not install pool_regclass function coming pgpool. Without pgpool_reglclass, pgpool-II only deals with table names without schema qualification.

Why max_connection must satisfy this formula max_connection >= (num_init_children * max_pool) and not max_connection >= num_init_children?

Probably you need to understand how pgpool uses these variables. Here is internal processing inside pgpool.
  1. Wait for connection request from clients.
  2. pgpool child receives connection request from a client.
  3. The pgpool child looks for existing connection in the pool which has requested database/user pair up to max_pool.
  4. If found, reuse it.
  5. If not found, opens a new connection to PostgreSQL and registers to the pool. If the pool has no empty slot, closes the oldest connection to PostgreSQL and reuse the slot.
  6. Do some query processing until the client sends session close request.
  7. Close the connection to client but keeps the connection to PostgreSQL for future use.
  8. Go to #1

Is connection pool cache shared among pgpool process?

No, the connection pool cache is in pgpool's process private memory and is not shared by other pgpool. This is how the connection cache is managed: Suppose pgpool process 12345 has connection cache for database A/user B but process 12346 does not have connection cache for database A/user B and both 12345 and 12346 are in idle state(no client is connecting at this point). If client connects to pgpool process 12345 with database A/user B, then the exisiting connection of 12345 is reused. On the other hand, If client connects to pgpool process 12346, 12346 needs to create new connection. Whether 12345 or 12346 is chosen, is not under control of pgpool. However in the long run, each pgpool child process will be equally chosen and it is expected that each process's pool will be resued equally.

Why my SELECTs are not cached?

Certain libraries such as iBatis, MyBatis always rollback transactions if they are not explicitely committed. Pgpool never caches SELECTs result in a rollbacked transaction because they might not be inconsistent.

Can I use # comments or blank lines in pool_passwd?

The answer is simple. No (just like /etc/passwd).

I cannot use MD5 authentication if start pgpool without -n option. Why?

You must have given -f option as a relative path: i.e. "-f pgpool.conf", rather than full path: i.e. "-f /usr/local/etc/pgpool.conf". Pgpool tries to locate the full path of pool_passwd (which is neccesary for MD5 auth) from pgpool.conf path. This is fine with -n option. However if pgpool starts without -n option, it changes current directory to "/", which is neccessary processs for daemonizing. As a result, pgpool tries to open "/pool_passwd", which will not successs.

I see standby servers go down status in steaming replication mode and see PostgreSQL messages "terminating connection due to conflict" Why?

If you see following messages along with those, it is likely vacuum on primary server removes rows which SELECTs on standby server want to see. Workaround is setting "hot_standby_feedback = on" in your standby server's postgresql.conf.
2013-04-07 19:38:10 UTC FATAL:  terminating connection due to conflict with recovery
2013-04-07 19:38:10 UTC DETAIL:  User query might have needed to see row versions that must be removed.
2013-04-07 19:38:10 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2013-04-07 19:38:10 UTC LOG:  could not send data to client: Connection reset by peer
2013-04-07 19:38:10 UTC ERROR:  canceling statement due to conflict with recovery
2013-04-07 19:38:10 UTC DETAIL:  User query might have needed to see row versions that must be removed.
2013-04-07 19:38:10 UTC LOG:  could not send data to client: Broken pipe
2013-04-07 19:38:10 UTC FATAL:  connection to client lost

Every few minites load of the system which pgpool-II running on gets high as much as 5-10. Why?

Mulptiple users stats that this is observed only Linux kernel 3.0. 2.6 or 3.2 does show the behavior. We suspect that there is a problem with 3.0 kernel. See more discussions on "[pgpool-general: 1528] Mysterious Load Spikes".

When watchdog enabled and the connection number reach the number of num_init_children, VIP switchover occurs. Why?

When the connection number reach the number of num_init_children, the watchdog will be failed because select 1 is failed, and then VIP will be transfer to another pgpool. Unfortunately, there are no way to discriminate normal client's connections from watchdog's connection. Larger num_init_children, wd_life_point and smaller wd_interval may prevent the problem somewhat.
The next major version, pgpool-II 3.3, will support a new monitoring method which uses UDP heartbeat packets instead of queries such like 'SELECT 1' to resolve the problem.

Why do I need to install pgpool_regclass?

If you are using PostgreSQL 8.0 or later, installing pgpool_regclass function on all PostgreSQL to be accessed by pgpool-II is strongly recommended, as it is used internally by pgpool-II. Without this, handling of duplicate table names in different schema might cause trouble (temporary tables aren't a problem).
Related FAQ is here http://www.pgpool.net/mediawiki/index.php?title=FAQ&action=submit#I.27m_using_pgpool-II_in_replication_mode._I_expected_that_pgpool-II_replaces_current_timestamp_call_with_time_constants_in_my_INSERT_query.2C_but_actually_it_doesn.27t._Why.3F
If you are using PostgreSQL 9.4.0 or later and pgpool-II 3.3.4 or later, 3.4.0 or later, you don't need to install pgpool_regclass since PostgreSQL 9.4 has built-in pgpool_regclass like function "to_regclass".

md5 authentication does not work. Please help

There's an excellent summary of various check points to set up md5 authentication. Please take a look at it.

I'm running pgpool/PostgreSQL on Amazon AWS and occasionaly I get network errors. Why?

It's a known problem with AWS. We recommend to complain to the Amazon support.
pgpool-II 3.3.4, 3.2.9 or later mitigate the problem by changing timeout value for connect(actually select system call) from 1 second to 10 seconds.
Also pgpool-II 3.4 or later has a switch to control the timeout value.

I cannot run pcp command on my Ubuntu box. Why?

pcp commands need libpcp.so. In Ubuntu it is included "libpgpool0" package.

On line recovery failed. How can I debug this?

pcp_recovery_node executes recovery_1st_stage_command and/or recovery_2nd_stage_command depending on your configuration. Those scripts are supposed to be executed on the master PostgreSQL node (the first live node in replication mode or primary node in streaming replication mode). "BackendError" means there's something wrong in pgpool and/or PostgreSQL. To verify this, I recommend followings;
  1. start pgpool with debug option
  2. execute pcp_recovery_node
  3. examin pgpool log and master PostgreSQL log

Watchdog doesn't start if not all "other" nodes are alive

It's a feature. Watchdog's lifeheck will start after all of the pgpools has started. Until this, failover of the virtual IP never occurs.

If I start transaction, pgool-II also starts a transaction on standby nodes. Why?

This is necessary to deal with the case when JDBC driver wants to use cursors. Pgpool-II takes a liberty of distributing SELECTs to the standby node including cursor statements. Unfortunately cursor statements need to be executed in an explicit transaction.

When I use schema qualified table names, pgpool-II does not invalidate on memory query cache and I got outdated data. Why?

It seems you did not install "pgpool_regclass" function. Without the function, pgpool-II ignores the schema name pat of the schema qualified table name and the cache invalidation fails.

I periodically get error message like "read_startup_packet: incorrect packet length". What does it mean?

Monitoring tools including Zabbix and Nagios periodically sends a packet or ping to the port which pgoool is listening on. Unfortunately those packets do not have correct contents, and pgpool-II complains it. If you are not sure who is sending such a packet, you could turn on "log_connections" to know the source host and port info. If they are from such tools, you could stop the monitoring to avoid the problem or even better, change the monitoring method to send legal query, for example, "SELECT 1".

I'm getting repeated errors like this every few minutes on Tomcat: "An I/O Error occurred while sending to the backend" Why?

Tomcat creates persistent connections to pgpool. If you set client_idle_limit to non 0, pgpool disconnects the connection and next time when Tomcat tries to send something to pgpool it breaks with the error message.
One solution is set client_idle_limit to 0. However this will leave lots of idle connections.
Another solution provided by Lachezar Dobrev is:
You might solve that by adding a time-out on the Tomcat side. http://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html
What you should set is (AFAIK):
minIdle (default is 10, set to 0)
timeBetweenEvictionRunsMillis (default 5000)
minEvictableIdleTimeMillis (default 60000)
This will try every 5 seconds and close any connections that were not used in the last 60 seconds. If you keep the sum of both numbers below the client time-out on the pgpool size connections should be closed at Tomcat side before they time-out on the pgpool side.
It is also beneficial to set the
testOnBorrow (default false, set to true)
validationQuery (default none, set to 'SELECT version();' no quotes)
This will help with connections should they expire while waiting, without supplying a disconnected connection to the application.

When I check pg_stat_activity view, I see a query like "SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u'" in active state for very long time. Why?

It's a limitation of pg_stat_activity. You can safely ignore it.
Pgpool-II issues queries like above for internal use to master node. When user query runs in extended protocol mode (sent from JDBC driver, for example), pgpool-II's query also runs in the mode. To make pg_stat_activity recognize the query finishes, pgpool-II needs to send a packet called "Sync", which unfortunately breaks user's query (more precisely, unnamed portal). Thus pgpool-II sends "Flush" packet instead but then pg_stat_activity does not recognize the end of the query.
Interesting thing is, if you enable log_duration, it logs the query finishes.

Online recovery always fails after certain minutes. Why?

It is possible that PostgreSQL statement_timeout kills the online recovery process. The process is executed as a SQL statement and if it's running too long, PostgreSQL sends signal 2 to the SQL and kills it. Varying by the size of the database, the online recovery process takes very long time. Make sure to disable statement_timeout or set it long enough time.

Why "SET default_transaction_isolation TO DEFAULT" fails ?

$ psql -h localhost -p 9999 -c 'SET default_transaction_isolation to DEFAULT;'
ERROR: kind mismatch among backends. Possible last query was: "SET default_transaction_isolation to DEFAULT;" kind details are: 0[N: statement: SET default_transaction_isolation to DEFAULT;] 1[C]
HINT: check data consistency among db nodes
ERROR: kind mismatch among backends. Possible last query was: "SET default_transaction_isolation to DEFAULT;" kind details are: 0[N: statement: SET default_transaction_isolation to DEFAULT;] 1[C]
HINT: check data consistency among db nodes
connection to server was lost
Pgpool-II detects that node 0 returns "N" (a NOTICE message comes from PostgreSQL) while node 1 returns "C" (which means the command finished).
Though pgpool-II expects that both node 0 and 1 returns identical messages, actually they are not. So pgpool-II threw an error.
Probably certain log/message settings are different in node 0 and 1. Please check client_min_messages or something like that.
They should be identical.

How does pgpool-II find the primary node?

pgpool-II issues "SELECT pg_is_in_recovery()" to each DB node. If it returns true, then the node is a standby node. If one of DB nodes returns false, then the node is the primary node and done.
Because it is possible that promoting node could return true for the SELECT, if no primary node is found and "search_primary_node_timeout" is greater than 0, pgpool-II sleeps 1 second and contines to issues the SELECT query to each DB node again until total sleep time exceeds search_primary_node_timeout.

Can I use pg_cancel_backend() or pg_terminate_backend()?

You can safely use pg_cancel_backend().
Be warned that pg_terminate_backend() will cause a fail over because it makes PostgreSQL emit an identical error code as postmaster shutdown. Pgpool-II 3.6 or greater mitigates the problem. See the manual for more details.
Remember that pgpool-II manages multiple PostgreSQL servers. To use the function, you not only need to identify the backend pid but the backend server.
If the query is running on the primary server, you can call the function something like
/*NO LOAD BALANCE*/ SELECT pg_cancel_backend(pid)
SQL comment is to prevent the SELECT be load balanced to standby. Of course you could issue the SELECT against directly the primary server.
If the query is running on one of standby servers, you need to issue the SELECT against directly the standby server.

Why my client is disconnected to pgpool-II when failover happens?

pgpool-II consists of many process where each process corresponds to client session. When failover occurs, each process may iterate on a loop for each backend without knowing a backend goes down. This may result in incorrect processing, or segfault in the worst case. For this reason, when failover occurs, pgpool-II parent process interrupt child process using signal to let them exit. Note that switch over using pcp_detach node has same effect.
From Pgpool-II 3.6 or greater, however, a fail over does not cause the disconnection in certain conditions. See the manual for more details.

Why am I getting "LOG: forked new pcp worker ..," and "LOG: PCP process with pid: xxxx exit with SUCCESS." messages in pgpool log?

Prior to pgpool-II 3.5, pgpool could only handle single PCP command at a time and all PCP commands were handled by a single PCP child process which lives throughout the lifespan of pgpool-II main process. In pgpool II 3.5 the restriction of single PCP command is removed and pgpool-II can now handle multiple simultaneous PCP commands. For every PCP command issued to pgpool a new PCP child process is forked and that process exits after execution of the PCP command is complete. So these log messages are perfectly normal and are generated whenever a new PCP worker process is created or completes execution for a PCP command.

How does pgpool-II handle md5 authentication?

  1. PostgreSQL and pgpool stores md5(password+username) into pool_password or pg_authid. From now on I denote string "password+username" as "S".
  2. When md5 auth is requested, pgpool sends a random number salt "s0" to frontend.
  3. Frontend replies back to pgpool with md5(S+s0).
  4. pgpool extracts S from pgpool_passwd and calculate md5(S+s0). If #3 and #4 matches, goes to next step.
  5. Each backend sends salt to pgpool. Suppose we have two backends b1 and b2, and salts are s1 and s2.
  6. pgpool extracts S from pgpool_passwd and calculate md5(S+s1) and send it to b1. pgpool extracts S from pgpool_passwd and calculate md5(S+s2) and send it to b2.
  7. If b1 and b2 agree with the authentication, the whole md5 auth process succeeds.

Why does not Pgpool-II automatically recognize a database comes back online?

It would be technically possible but we don't think it's a safe feature.
Consider a streaming replication configuration. When a standby comes back online, it does not necessarily means it connects to the current primary node. It may connect to a different primary node , or even it's not a standby any more. If Pgpool-II automatically recognizes such that standby as online, SELECTs to the standby node will return different result as the primary, which is a disaster for database applications.
Also please note that "pgpool reload" does not do anything for recognizing the standby node as online. It just reloads configuration files.

pgpoolAdmin Frequently Asked Questions

pgpoolAdmin does not show any node in pgpool status and node status. Why?

pgpoolAdmin uses PHP's PostgreSQL extention (pg_connect and pg_query etc.). Probably the extention does not work as expected. Please check apache error log. Also please check the FAQ item below.

Why does node status in pgpoolAdmin show "down" status even if PostgreSQL is up and running?

pgpoolAdmin checks PostgreSQL status by connecting with user = "health_check_user" and database = template1. Thus you should allow pgpoolAdmin to access PostgreSQL with those user and database without password. You can check PostgreSQL log to verify this. If health_check_user does not exist, you will see something like:
20148 2011-07-06 16:41:59 JST FATAL:  role "foo" does not exist
If the user is protected by password, you will see:
20220 2011-07-06 16:42:16 JST FATAL:  password authentication failed for user "foo"
20221 2011-07-06 16:42:16 JST LOG:  could not receive data from client: Connection reset by peer
20221 2011-07-06 16:42:16 JST LOG:  unexpected EOF within message length word
20246 2011-07-06 16:42:26 JST LOG:  could not receive data from client: Connection reset by peer
20246 2011-07-06 16:42:26 JST LOG:  unexpected EOF within message length word