FAQ: Difference between revisions

Revision as of 12:54, 12 July 2014

Pgpool-II Frequently Asked Questions

Why configure fails by "pg_config not found" on my Ubuntu box?

pg_config is in libpq-dev package. You need to install it before running configure.

Why records inserted on the primary node do not appear on the standby nodes?

Are you using streaming replication and a hash index on the table? Then it's a known limitation of streaming replication. The inserted record is there. But if you SELECT the record using the hash index, it will not appear. Hash index changes do not produce WAL record thus they are not reflected to the standby nodes. Solutions are: 1) use btree index instead 2) use pgpool-II native replication.

Can I mix different versions of PostgreSQL as pgpool-II backends?

You cannot mix different major versions of PostgreSQL, for example 8.4.x and 9.0.x. On the other hand you can mix different minor versions of PostgreSQL, for example 9.0.3 and 9.0.4. Pgpool-II assumes messages from PostgreSQL to pgpool-II are identical anytime. Different major version of PostgreSQL may send out different messages and this would cause trouble for Pgpool-II.

Can I mix different platforms of PostgreSQL as pgpool-II backends, for example Linux and Windows?

In streaming replication mode, no. Because streaming replication requires that primary and standby platforms are phsyically identical. On the other hand, pgpool-II's replication mode only requires logically database clusters identical. Beware, however, that online recovery script does not use rsync or some such, which do phical copying among database clusters. You want to use pg_dumpall instead.

It seems my pgpool-II does not do load balancing. Why?

First of all, pgpool-II' load balancing is "session base", not "statement base". That means, DB node selection for load balancing is decided at the beginning of session. So all SQL statements are sent to the same DB node until the session ends.

Another point is, whether statement is in an explicit transaction or not. If the statement is in a transaction, it will not be load balanced in the replication mode. In pgpool-II 3.0 or later, SELECT will be load balanced even in a transaction if operated in the master/slave mode.

Note the method to choose DB node is not LRU or some such. Pgpool-II chooses DB node randomly considering the "weight" parameter in pgpool.conf. This means that the chosen DB node is not uniformly distributed among DB nodes in short term. You might want to inspect the effect of load balancing after ~100 queries have been sent.

Also cursor statements are not load balanced in replication mode. i.e.:DECLARE..FETCH are sent to all DB nodes in replication mode. This is because the SELECT might come with FOR UPDATE/FOR SHARE. Note that some applications including psql could use CURSOR for SELECT. For example, from PostgreSQL 8.2, if "\set FETCH_COUNT n" is executed, psql unconditionaly uses a curor named "_psql_cursor".

How can I observe the effect of load balancing?

We recommend to enable "log_per_node_statement" directive in pgpool.conf for this. Here is an example of the log:

2011-05-07 08:42:42 LOG:   pid 22382: DB node id: 1 backend pid: 22409 statement: SELECT abalance FROM pgbench_accounts WHERE aid = 62797;

The "DB node id: 1" shows which DB node was chosen for this loadbalancing session.

Please make sure that you start pgpool-II with "-n" option to get pgpool-II log. (or you can use syslog in pgpool-II 3.1 or later)

Why am I getting "ProcessFrontendResponse: failed to read kind from frontend. frontend abnormally exited" in my pgool log?

Well, your clients might be ill-behaved:-) PostgreSQL's protocol requires clients to send particular packet before they disconnect the connection. pgpool-II complains that clients disconnect without sending the packet. You could reprodcude the problem by using psql. Connect to pgpool using psql. Kill -9 psql. You will silimar message in the log. The message will not appear if you quit psql normaly. Another possibility is unstable network connection between your client machine and pgpool-II. Check the cable and network interface card.

I'm running pgpool-II in streaming replication mode. It seems it works but I find following errors in the log. Why?

2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "E" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: s_do_auth: unknown response "[" before processing BackendKeyData
2011-07-19 08:21:59 ERROR: pid 10727: pool_read2: EOF encountered with backend
2011-07-19 08:21:59 ERROR: pid 10727: make_persistent_db_connection: s_do_auth failed
2011-07-19 08:21:59 ERROR: pid 10727: find_primary_node: make_persistent_connection failed

pgpool-II tries to connect to PostgreSQL to execute some functions such as pg_current_xlog_location(), which is used for detecting primary server or checking replication delay. The messages above indicate that pgpool-II failed to connect with user = health_check_user and password = health_check_password. You need to set them properly even if health_check_period = 0.

Note that pgpool-II 3.1 or later will use sr_check_user and sr_check_password for it instead.

When I run pgbench to test pgpool-II, pgbench hangs. If I directly run pgbench against PostgreSQL, it works fine. Why?

pgbench creates concurrent connections (the number of connections is specified by "-c" option) before starting actual transactions. So if the number of concurrent transactions specified by "-c" exceeds num_init_children, pgbench will stuck because it will wait for pgpool accepting connections forever (remember that pgpool-II accepts up to num_init_children concurrent sessions. If the number of concurrent sessions reach num_init_children, new session will be queued). On the other hand PostgreSQL does not accept concurrent sessions more than max_connections. So in this case you will just see PostgreSQL errors, rather than connection blocking. If you want to test pgpool-II's connection queuing, you can use psql instead of pgbench. In the example session below, num_init_children = 1 (this is not a recommended setting in the real world. This is just for simplicity).

$ psql test <-- connect to pgpool from terminal #1
psql (9.1.1)
Type "help" for help.
test=# 
$ psql test <-- tries to connect to pgpool from terminal #2 but it is blocked.
test=# SELECT 1; <--- do something from terminal #1 psql
test=# \q <-- quit psql session on terminal #1
psql (9.1.1) <-- now psql on terminal #2 accepts session
Type "help" for help.
test=#

I created pool_hba.conf and pool_passwd to enable md5 authentication through pgpool-II but it does not work. Why?

Probably you made mistake somewhere. For your help here is a table which describes error patterns depending on the setting of pg_hba.conf, pool_hba.conf and pool_passwd.

pg_hba.conf	pool_hba.conf	pool_passwd	result
md5	md5	yes	md5 auth
md5	md5	no	"MD5" authentication with pgpool failed for user "XX"
md5	trust	yes/no	MD5 authentication is unsupported in replication, master-slave and parallel mode
trust	md5	yes	no auth
trust	md5	no	"MD5" authentication with pgpool failed for user "XX"
trust	trust	yes/no	no auth

How can I set up SSL for pgpool-II?

SSL support for pgpool-II consists of two parts: 1)between client and pgpool-II 2)pgpool-II and PostgreSQL. #1 and #2 are independent each other. For example, you can only enable SSL connection of #1, or #2. Or you can enable both #1 and #2. I explain #1 (for #2, please take a look at PostgreSQL documentation).
Make sure that pgpool is built with openssl. If you build from source code, use --with-openssl option.
First create server certificate. In the command below you will be asked PEM pass phrase(It will be asked when pgpool starts up). If you want to start pgpool without being asked pass phrase, you can remove it later. (sample server certficate create session)

openssl req -new -text -out server.req

Remove PEM pass phrase if you want.

$ openssl rsa -in privkey.pem -out server.key
Enter pass phrase for privkey.pem:
writing RSA key
$ rm privkey.pem

Turn the certificate into a self-signed certificate.

$ openssl req -x509 -in server.req -text -key server.key -out server.crt

Copy server.key and server.crt to appropreate place. Suppose we copy to /usr/local/etc. Make sure that you use cp -p to retain appropreate permission of server.key. Alternatively you can set permission later.

$ chmod og-rwx /usr/local/etc/server.key

Set the certificate and key location in pgpool.conf.

ssl = on
ssl_key = '/usr/local/etc/server.key'
ssl_cert = '/usr/local/etc/server.crt'

Restart pgpool. To confirm SSL connection between client and pgpool is working, connect to pgpool using psql.

psql -h localhost -p 9999 test
psql (9.1.1)
SSL connection (cipher: AES256-SHA, bits: 256)
Type "help" for help.

test=# \q

If you see "SSL connection...", SSL connection between client and pgpool is working. Please make sure that use "-h localhost" option. Because SSL only works with TCP/IP, with Unix domain socket SSL does not work.

I'm using pgpool-II in replication mode. I expected that pgpool-II replaces current_timestamp call with time constants in my INSERT query, but actually it doesn't. Why?

Probably your INSERT query uses schema qualied table name (like public.mytable) and you did not install pool_regclass function coming pgpool. Without pgpool_reglclass, pgpool-II only deals with table names without schema qualification.

**Why max_connection must satisfy this formula max_connection >= (num_init_children * max_pool) and not max_connection >= num_init_children?**

Probably you need to understand how pgpool uses these variables. Here is internal processing inside pgpool.

Wait for connection request from clients.
pgpool child receives connection request from a client.
The pgpool child looks for existing connection in the pool which has requested database/user pair up to max_pool.
If found, reuse it.
If not found, opens a new connection to PostgreSQL and registers to the pool. If the pool has no empty slot, closes the oldest connection to PostgreSQL and reuse the slot.
Do some query processing until the client sends session close request.
Close the connection to client but keeps the connection to PostgreSQL for future use.
Go to #1

Is connection pool cache shared among pgpool process?

No, the connection pool cache is in pgpool's process private memory and is not shared by other pgpool. This is how the connection cache is managed: Suppose pgpool process 12345 has connection cache for database A/user B but process 12346 does not have connection cache for database A/user B and both 12345 and 12346 are in idle state(no client is connecting at this point). If client connects to pgpool process 12345 with database A/user B, then the exisiting connection of 12345 is reused. On the other hand, If client connects to pgpool process 12346, 12346 needs to create new connection. Whether 12345 or 12346 is chosen, is not under control of pgpool. However in the long run, each pgpool child process will be equally chosen and it is expected that each process's pool will be resued equally.

Why my SELECTs are not cached?

Certain libraries such as iBatis, MyBatis always rollback transactions if they are not explicitely committed. Pgpool never caches SELECTs result in a rollbacked transaction because they might not be inconsistent.

Can I use # comments or blank lines in pool_passwd?

The answer is simple. No (just like /etc/passwd).

I cannot use MD5 authentication if start pgpool without -n option. Why?

You must have given -f option as a relative path: i.e. "-f pgpool.conf", rather than full path: i.e. "-f /usr/local/etc/pgpool.conf". Pgpool tries to locate the full path of pool_passwd (which is neccesary for MD5 auth) from pgpool.conf path. This is fine with -n option. However if pgpool starts without -n option, it changes current directory to "/", which is neccessary processs for daemonizing. As a result, pgpool tries to open "/pool_passwd", which will not successs.

I see standby servers go down status in steaming replication mode and see PostgreSQL messages "terminating connection due to conflict" Why?

If you see following messages along with those, it is likely vacuum on primary server removes rows which SELECTs on standby server want to see. Workaround is setting "hot_standby_feedback = on" in your standby server's postgresql.conf.

2013-04-07 19:38:10 UTC FATAL:  terminating connection due to conflict with recovery
2013-04-07 19:38:10 UTC DETAIL:  User query might have needed to see row versions that must be removed.
2013-04-07 19:38:10 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2013-04-07 19:38:10 UTC LOG:  could not send data to client: Connection reset by peer
2013-04-07 19:38:10 UTC ERROR:  canceling statement due to conflict with recovery
2013-04-07 19:38:10 UTC DETAIL:  User query might have needed to see row versions that must be removed.
2013-04-07 19:38:10 UTC LOG:  could not send data to client: Broken pipe
2013-04-07 19:38:10 UTC FATAL:  connection to client lost

Every few minites load of the system which pgpool-II running on gets high as much as 5-10. Why?

Mulptiple users stats that this is observed only Linux kernel 3.0. 2.6 or 3.2 does show the behavior. We suspect that there is a problem with 3.0 kernel. See more discussions on "[pgpool-general: 1528] Mysterious Load Spikes".

When watchdog enabled and the connection number reach the number of num_init_children, VIP switchover occurs. Why?

When the connection number reach the number of num_init_children, the watchdog will be failed because select 1 is failed, and then VIP will be transfer to another pgpool. Unfortunately, there are no way to discriminate normal client's connections from watchdog's connection. Larger num_init_children, wd_life_point and smaller wd_interval may prevent the problem somewhat.

The next major version, pgpool-II 3.3, will support a new monitoring method which uses UDP heartbeat packets instead of queries such like 'SELECT 1' to resolve the problem.

Why do I need to install pgpool_regclass?

If you are using PostgreSQL 8.0 or later, installing pgpool_regclass function on all PostgreSQL to be accessed by pgpool-II is strongly recommended, as it is used internally by pgpool-II. Without this, handling of duplicate table names in different schema might cause trouble (temporary tables aren't a problem).

md5 authentication does not work. Please help

There's an excellent summary of various check points to set up md5 authentication. Please take a look at it.

http://www.pgpool.net/pipermail/pgpool-general/2013-May/001773.html

I'm running pgpool/PostgreSQL on Amazon AWS and occasionaly I get network errors. Why?

It's a known problem with AWS. We recommend to complain to the Amazon support.

pgpool-II 3.3.4, 3.2.9 or later mitigate the problem by changing timeout value for connect(actually select system call) from 1 second to 10 seconds.

Also pgpool-II 3.4 or later has a switch to control the timeout value.

I cannot run pcp command on my Ubuntu box. Why?

pcp commands need libpcp.so. In Ubuntu it is included "libpgpool0" package.

On line recovery failed. How can I debug this?

pcp_recovery_node executes recovery_1st_stage_command and/or recovery_2nd_stage_command depending on your configuration. Those scripts are supposed to be executed on the master PostgreSQL node (the first live node in replication mode or primary node in streaming replication mode). "BackendError" means there's something wrong in pgpool and/or PostgreSQL. To verify this, I recommend followings;

start pgpool with debug option
execute pcp_recovery_node
examin pgpool log and master PostgreSQL log

Watchdog doesn't start if not all "other" nodes are alive

It's a feature. Watchdog's lifeheck will start after all of the pgpools has started. Until this, failover of the virtual IP never occurs.

If I start transaction, pgool-II also starts a transaction on standby nodes. Why?

This is necessary to deal with the case when JDBC driver wants to use cursors. Pgpool-II takes a liberty of distributing SELECTs to the standby node including cursor statements. Unfortunately cursor statements need to be executed in an explicit transaction.

When I use schema qualified table names, pgpool-II does not invalidate on memory query cache and I got outdated data. Why?

It seems you did not install "pgpool_regclass" function. Without the function, pgpool-II ignores the schema name pat of the schema qualified table name and the cache invalidation fails.

I periodically get error message like "read_startup_packet: incorrect packet length". What does it mean?

Monitoring tools including Zabbix and Nagios periodically sends a packet or ping to the port which pgoool is listening on. Unfortunately those packets do not have correct contents, and pgpool-II complains it. If you are not sure who is sending such a packet, you could turn on "log_connections" to know the source host and port info. If they are from such tools, you could stop the monitoring to avoid the problem or even better, change the monitoring method to send legal query, for example, "SELECT 1".

I'm getting repeated errors like this every few minutes on Tomcat: "An I/O Error occurred while sending to the backend" Why?

Tomcat creates persistent connections to pgpool. If you set client_idle_limit to non 0, pgpool disconnects the connection and next time when Tomcat tries to send something to pgpool it breaks with the error message.

One solution is set client_idle_limit to 0. However this will leave lots of idle connections.

Another solution provided by Lachezar Dobrev is:

You might solve that by adding a time-out on the Tomcat side. http://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html

What you should set is (AFAIK):

minIdle (default is 10, set to 0)

timeBetweenEvictionRunsMillis (default 5000)

minEvictableIdleTimeMillis (default 60000)

This will try every 5 seconds and close any connections that were not used in the last 60 seconds. If you keep the sum of both numbers below the client time-out on the pgpool size connections should be closed at Tomcat side before they time-out on the pgpool side.

It is also beneficial to set the

testOnBorrow (default false, set to true)

validationQuery (default none, set to 'SELECT version();' no quotes)

This will help with connections should they expire while waiting, without supplying a disconnected connection to the application.

When I check pg_stat_activity view, I see a query like "SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u'" in active state for very long time. Why?

It's a limitation of pg_stat_activity. You can safely ignore it.

Pgpool-II issues queries like above for internal use to master node. When user query runs in extended protocol mode (sent from JDBC driver, for example), pgpool-II's query also runs in the mode. To make pg_stat_activity recognize the query finishes, pgpool-II needs to send a packet called "Sync", which unfortunately breaks user's query (more precisely, unnamed portal). Thus pgpool-II sends "Flush" packet instead but then pg_stat_activity does not recognize the end of the query.

Interesting thing is, if you enable log_duration, it logs the query finishes.

Online recovery always fails after certain minutes. Why?

It is possible that PostgreSQL statement_timeout kills the online recovery process. The process is executed as a SQL statement and if it's running too long, PostgreSQL sends signal 2 to the SQL and kills it. Varying by the size of the database, the online recovery process takes very long time. Make sure to disable statement_timeout or set it long enough time.

Why "SET default_transaction_isolation TO DEFAULT" fails ?

$ psql -h localhost -p 9999 -c 'SET default_transaction_isolation to DEFAULT;'
ERROR: kind mismatch among backends. Possible last query was: "SET default_transaction_isolation to DEFAULT;" kind details are: 0[N: statement: SET default_transaction_isolation to DEFAULT;] 1[C]
HINT: check data consistency among db nodes
ERROR: kind mismatch among backends. Possible last query was: "SET default_transaction_isolation to DEFAULT;" kind details are: 0[N: statement: SET default_transaction_isolation to DEFAULT;] 1[C]
HINT: check data consistency among db nodes
connection to server was lost

Pgpool-II detects that node 0 returns "N" (a NOTICE message comes from PostgreSQL) while node 1 returns "C" (which means the command finished).

Though pgpool-II expects that both node 0 and 1 returns identical messages, actually they are not. So pgpool-II threw an error.

Probably certain log/message settings are different in node 0 and 1. Please check client_min_messages or something like that.

They should be identical.

pgpoolAdmin Frequently Asked Questions

pgpoolAdmin does not show any node in pgpool status and node status. Why?

pgpoolAdmin uses PHP's PostgreSQL extention (pg_connect and pg_query etc.). Probably the extention does not work as expected. Please check apache error log. Also please check the FAQ item below.

Why does node status in pgpoolAdmin show "down" status even if PostgreSQL is up and running?

pgpoolAdmin checks PostgreSQL status by connecting with user = "health_check_user" and database = template1. Thus you should allow pgpoolAdmin to access PostgreSQL with those user and database without password. You can check PostgreSQL log to verify this. If health_check_user does not exist, you will see something like:

20148 2011-07-06 16:41:59 JST FATAL:  role "foo" does not exist

If the user is protected by password, you will see:

20220 2011-07-06 16:42:16 JST FATAL:  password authentication failed for user "foo"
20221 2011-07-06 16:42:16 JST LOG:  could not receive data from client: Connection reset by peer
20221 2011-07-06 16:42:16 JST LOG:  unexpected EOF within message length word
20246 2011-07-06 16:42:26 JST LOG:  could not receive data from client: Connection reset by peer
20246 2011-07-06 16:42:26 JST LOG:  unexpected EOF within message length word

@@ Line 217: / Line 217: @@
 === '''I'm running pgpool/PostgreSQL on Amazon AWS and occasionaly I get network errors. Why?''' ===
 : It's a known problem with AWS. We recommend to complain to the Amazon support.
+: pgpool-II 3.3.4, 3.2.9 or later mitigate the problem by changing timeout value for connect(actually select system call) from 1 second to 10 seconds.
+: Also pgpool-II 3.4 or later has a switch to control the timeout value.
 === '''I cannot run pcp command on my Ubuntu box. Why?''' ===