From pgpool Wiki
Revision as of 09:15, 20 May 2013 by Nagata (talk | contribs) (Fix memory leak in pool_config.c)
Jump to: navigation, search

Pgpool-II TODO list

Ability to load balance based on Client IP

From bugid 26: I have recently moved a database from Mysql to postgresql 9.1.5 which is behind a pgpool-II-3.1.4 . Everything went fine until i observed that some "tickets" are not created correctly by the application (OTRS) that populate the database.
After some debugging i found/guess that the problem is the following:
when a cron job wants to create a ticket he has to insert info in abut 10 tables, and i guess that the 2-nd, 3-rd ... inserts depends on the first. The problem was that this operation is not performed transactionally so after the first insert, when the app tries to perform the other inserts, first tries to select "the first insert", but this first insert is still not propagated to all nodes, and the error occurs.
I`m aware of the fact that if this entire operation would be performed transactionally (only on master) the issue is solved, but unfortunately i cannot modify the app.
So i want to know if there is any way that i can tell to pgpool something like :
any request from this ip do not load balance.
PS. temporary i have set the weight factor to 0 to the 2-nd and 3-rd postgresql slaves and it behaves ok, because reads and writes only from master.

Restart watchdog process when it abnormaly exits

It would be nice for pgpool main to restart watchdog process when it dies abormaly.
(Nagata is working on this for pgpool-II 3.3)

Automatically reattach a node in streaming master/slave configuration

In streaming master/slave configuration there could be an option to automatically reattach a node if it's up-to-date with the master (0 bytes behind). It often happens that due to minor network outage a slave node is dropped off from pgpool and stays down even if the the node has resumed replication with master and is up-to-date.pgpool already knows how much slave is behind master so i guess this wouldn't be too difficult to implement? (from bugtrack #17)

Synchronize backend nodes information with watchdog when standby pgpool starts up

For example, when a certain node is detached from active pgpool and then standby pgpool starts up, the standby pgpool can't recognized that the node is detached. Standby pgpool should get information about node information from other pgpool.

Avoid multiple pgpools from executing simultaneously.

In master-slave mode with watchdog, when a backend DB is down, all pgpools execute It might cause something wrong.
(Nagata is working on this for pgpool-II 3.3)

Allow to use client encoding

It would be nice if pgpool client could use encoding which different from PostgreSQL server encoding.
To implement this, the parser should be able to handle "unsafe" encodings such as Shift_JIS. psql replaces second byte of each multibyte character to fool the parser. We could hire similar strategy.

Send read query only to standbys even after fail over

We can configure pgpool-II to not send read queries to the primary. However after a fail over, the role of the node could be changed.
To solve the problem, we need new flag to specify that read queries always are sent to standbys regardless the fail over ([pgpool-general: 1621] backend weight after failover).

Recognize multi statemnet queries

As stated in the document, pgpool-II does not recognize multi statement queries correctly (BEGIN;SELECT 1;END). Pgpool-II only parses the first element of the query ("BEGIN" in this case) and decides how to behave.
Of course this will bring various problems. It would be nice if we could understand the each part of the multi statement queries.

Cursor statements are not load balanced, sent to all DB nodes in replication mode

DECLARE..FETCH are sent to all DB nodes in replication mode. This is because the SELECT might come with FOR UPDATE/FOR SHARE.
It would be nice if pgpool-II checks if the SELECT uses FOR UPDATE/FOR SHARE and if not, enable load balance (or only sends to the master node if load balance is disabled).
Note that some applications including psql could use CURSOR for SELECT. For example, from PostgreSQL 8.2, if "\set FETCH_COUNT n" is executed, psql unconditionaly uses a curor named "_psql_cursor".

Support IPV6 network

This is an obvious requirement.

Add new parameter for searching primary node timeout

pgpool-II uses "recovery_timeout" for searching the primary node timeout after failover. Since this is an abuse of the parameter, we should add new parameter for searching the primary node.

Import PostgreSQL's execption handling

PostgreSQL's exception handling (elog family) is pretty good tool to make codes to be simple and robust. It would be nice if pgpool could use this.

Handle abnormal down of virtual IP interface when watchdog enabled

When virtual IP interface is dropped abnormally by manual ifconfig etc., there are no one holding VIP, and clients aren't able to connect pgpool-II. Watchdog of active pgpool should monitor the interface or VIP, and handle its down.

Remove on disk query cache

Old on disk query cache has almost 0 user and has sevior limitation, including no automatic cache invalidation. This has been already obsoleted since on memory query cache implemented. We should remove this.

Do not invalidate query cache created in a transaction in some cases

Currently new query cache for table t1 created in a transaction is removed at commit if there's DMLs which touch t1 in the same transaction. Apparently this is overkill for same cases:
To enhance this, we need to teach pgpool-II about "order of SELECTs and DMLs.".

Fix memory leak in pool_config.c

The module in charge of parsing pgpool.conf has memory leak problem. Usually pgpool reads pgpool.conf just once at the start up time, it is not a big problem. However reloading pgpool.conf will leak memory and definitely a problem. Also using memory leak check tools like valgrind emit lots of error messages and very annoying. So it would be nice to fix the problem in the future..