5.7. Load Balancing

Pgpool-II load balancing of SELECT queries works with Master Slave mode (Section 5.3.1) and Replication mode (Section 5.3.2). When enabled Pgpool-II sends the writing queries to the primary node in Master Slave mode, all of the backend nodes in Replication mode, and other queries get load balanced among all backend nodes. To which node the load balancing mechanism sends read queries is decided at the session start time and will not be changed until the session ends. The only exception is by writing special SQL comments. See below for more details.

Note: Queries which are sent to primary node or replicated because they cannot be balanced are also accounted for in the load balancing algorithm.

Note: If you don't want a query that qualifies for the load balancing to be load balanced by Pgpool-II, you can put /*NO LOAD BALANCE*/ comment before the SELECT statement. This will disable the load balance of the particular query and Pgpool-II will send it to the master node (the primary node in Master Slave mode).

Note: You can check which DB node is assigned as the load balancing node by using SHOW POOL NODES.

5.7.1. Condition for Load Balancing

For a query to be load balanced, all the following requirements must be met:

Note: You could suppress load balancing by inserting arbitrary comments just in front of the SELECT query:

    /*REPLICATION*/ SELECT ...
   

If you want to use comments without suppressing load balancing, you can set allow_sql_comments to on. Please refer to replicate_select as well.

Note: The JDBC driver has an autocommit option. If the autocommit is false, the JDBC driver sends "BEGIN" and "COMMIT" by itself. In this case the same restriction above regarding load balancing will be applied.

5.7.2. Load Balancing in Streaming Replication

While using Streaming replication and Hot Standby, it is important to determine which query can be sent to the primary or the standby, and which one should not be sent to the standby. Pgpool-II's Streaming Replication mode carefully takes care of this.

We distinguish which query should be sent to which node by looking at the query itself.

In an explicit transaction:

In the extended protocol, it is possible to determine if the query can be sent to standby or not in load balance mode while parsing the query. The rules are the same as for the non extended protocol. For example, INSERTs are sent to the primary node. Following bind, describe and execute will be sent to the primary node as well.

Note: If the parse of a SELECT statement is sent to the standby node due to load balancing, and then a DML statement, such as an INSERT, is sent to Pgpool-II, then the parsed SELECT will have to be executed on the primary node. Therefore, we re-parse the SELECT on the primary node.

Lastly, queries that Pgpool-II's parser thinks to be an error are sent to the primary node.

5.7.3. Load Balancing Settings

load_balance_mode (boolean)

When set to on, Pgpool-II enables the load balancing on incoming SELECT queries. i.e. SELECT queries from the clients gets distributed to the configured PostgreSQL backends. Default is off.

This parameter can only be set at server start.

ignore_leading_white_space (boolean)

When set to on, Pgpool-II ignores the white spaces at the beginning of SQL queries in load balancing. It is useful if used with APIs like DBI/DBD:Pg which adds white spaces against the user's intention.

This parameter can be changed by reloading the Pgpool-II configurations.

white_function_list (string)

Specifies a comma separated list of function names that DO NOT update the database. SELECTs including functions not specified in this list are not load balanced. These are replicated among all the DB nodes in Replication mode, sent to the primary node only in Master Slave mode.

You can use regular expression to match function names, to which ^ and $ are automatically added.

Example 5-2. Using regular expression

If you have prefixed all your read only function with 'get_' or 'select_', You can set the white_function_list like below:

	 white_function_list = 'get_.*,select_.*'
	

Note: Schema qualifications can not be used in white_function_list because Pgpool-II silently ignores a schema qualification in function names appearing in an input SQL while comparing the list and the input SQL. As a result, a schema qualified function name in the list never matches function names appearing in the input SQL.

This parameter can be changed by reloading the Pgpool-II configurations.

black_function_list (string)

Specifies a comma separated list of function names that DO update the database. SELECTs including functions specified in this list are not load balanced. These are replicated among all the DB nodes in Replication mode, sent to the primary node only in Master Slave mode.

You can use regular expression to match function names, to which ^ and $ are automatically added.

Example 5-3. Using regular expression

If you have prefixed all your updating functions with 'set_', 'update_', 'delete_' or 'insert_', You can set the black_function_list like below:

	 black_function_list = 'nextval,setval,set_.*,update_.*,delete_.*,insert_.*'
	

Note: Schema qualifications can not be used in black_function_list because Pgpool-II silently ignores a schema qualification in function names appearing in an input SQL while comparing the list and the input SQL. As a result, a schema qualified function name in the list never matches function names appearing in the input SQL.

Note: black_function_list and white_function_list are mutually exclusive and only one of the two lists can be set in the configuration.

Example 5-4. Configuring using nextval() and setval() to land on proper backend

Prior to Pgpool-IIV3.0, nextval() and setval() were known as functions writing to the database. You can configure this by setting black_function_list and white_function_list as follows

	  white_function_list = ''
	  black_function_list = 'nextval,setval,lastval,currval'
	 

Note: PostgreSQL also contains lastval() and currval() in addition to nextval() and setval(). Though lastval() and currval() are not writing function type, but it is advised to treat lastval() and currval() as writing functions to avoid errors which occur when these functions are accidentally load balanced.

This parameter can be changed by reloading the Pgpool-II configurations.

database_redirect_preference_list (string)

Specifies the list of "database-name:node id" pairs to send SELECT queries to a particular backend node for a particular database connection. For example, by specifying "test:1", Pgpool-II will redirect all SELECT queries to the backend node of ID 1 for the connection to "test" database. You can specify multiple "database name:node id" pair by separating them using comma (,).

Regular expressions are also accepted for database name. You can use special keywords as node id. If "primary" is specified, queries are sent to the primary node, and if "standby" is specified, one of the standby nodes are selected randomly based on weights.

Example 5-5. Using database_redirect_preference_list

If you want to configure the following SELECT query routing rules:

  • Route all SELECT queries on postgres database to the primary backend node.

  • Route all SELECT queries on mydb0 or on mydb1 databases to backend node of ID 1.

  • Route all SELECT queries on mydb2 database to standby backend nodes.

then the database_redirect_preference_list will be configured as follows:

	 database_redirect_preference_list = 'postgres:primary,mydb[01]:1,mydb2:standby'
	

This parameter can be changed by reloading the Pgpool-II configurations.

app_name_redirect_preference_list (string)

Specifies the list of "application-name:node id" pairs to send SELECT queries to a particular backend node for a particular client application connection.

Note: In PostgreSQL V9.0 or later the "Application name" is a name specified by a client when it connects to database.

For example, application name of psql command is "psql"

Note: Pgpool-II recognizes the application name only specified in the start-up packet. Although a client can provide the application name later in the session, but that does not get considered by the Pgpool-II for query routing.

The notion of app_name_redirect_preference_list is same as the database_redirect_preference_list thus you can also use the regular expressions for application names. Similarly special keyword "primary" indicates the primary node and "standby" indicates one of standby nodes.

Example 5-6. Using app-name_redirect_preference_list

If you want to configure the following SELECT query routing rules:

  • Route all SELECT from psql client to the primary backend node.

  • Route all SELECT queries from myapp1 client to backend node of ID 1.

  • Route all SELECT queries from myapp2 client to standby backend nodes.

then the app_name_redirect_preference_list will be configured as follows:

	 app_name_redirect_preference_list = 'psql:primary,myapp1:1,myapp2:standby'
	

Caution

JDBC driver PostgreSQL-9.3 and earlier versions does not send the application name in the startup packet even if the application name is specified using the JDBC driver option "ApplicationName" and "assumeMinServerVersion=9.0". So if you want to use the app_name_redirect_preference_list feature through JDBC, Use PostgreSQL-9.4 or later version of the driver.

This parameter can be changed by reloading the Pgpool-II configurations.

allow_sql_comments (boolean)

When set to on, Pgpool-II ignore the SQL comments when identifying if the load balance or query cache is possible on the query. When this parameter is set to off, the SQL comments on the query could effectively prevent the query from being load balanced or cached (pre Pgpool-II V3.4 behavior).

This parameter can be changed by reloading the Pgpool-II configurations. You can also use PGPOOL SET command to alter the value of this parameter for a current session.