Difference between revisions of "TODO"
(→Pgpool-II TODO list) |
(→Health-check timeout for each backend node) |
||
Line 83: | Line 83: | ||
=== Health-check timeout for each backend node === | === Health-check timeout for each backend node === | ||
− | : In the current, timeout values specified by health_check_timeout means the total time for checking all the backend status. Hence, if it takes a long time to succeed to check a backend, when timeout occurs during checking the next backend | + | : In the current, timeout values specified by health_check_timeout means the total time for checking all the backend status. Hence, if it takes a long time to succeed to check a backend, when timeout occurs during checking the next backend, this node is regarded as failed and failovered even though this is healthy.To resolve this issue, we need health-check timeout for each backend. |
== TODOs already done == | == TODOs already done == |
Revision as of 07:38, 4 March 2016
Contents
- 1 Pgpool-II TODO list
- 1.1 Move relation cache to shared memory
- 1.2 Allow to use multiple pgpool-II instances with in-memory query cache enabled
- 1.3 Allow to use pg_rewind in online recovery
- 1.4 Do not disconnect to clients when a fail over happens
- 1.5 Support peer auth
- 1.6 Enhance documents
- 1.7 Automatically reattach a node in streaming master/slave configuration
- 1.8 Allow to use client encoding
- 1.9 Send read query only to standbys even after fail over
- 1.10 Recognize multi statemnet queries
- 1.11 Cursor statements are not load balanced, sent to all DB nodes in replication mode
- 1.12 Support IPv6 network
- 1.13 Handle abnormal down of virtual IP interface when watchdog enabled
- 1.14 Do not invalidate query cache created in a transaction in some cases
- 1.15 Fix memory leak in pool_config.c
- 1.16 Add SET commnad
- 1.17 Put together a definition of error codes into a single header file
- 1.18 Create separate process for health checking
- 1.19 Import PostgreSQL's latch module
- 1.20 Allow to use schema qualifications in black_function_list and white_function_list
- 1.21 Support multiple UNIX domain socket directories
- 1.22 Health-check timeout for each backend node
- 2 TODOs already done
- 2.1 Allow to specify which node is dead when starting up
- 2.2 Ability to load balance based on Client IP, database, table etc.
- 2.3 Import PostgreSQL's execption handling
- 2.4 Allow to print user name in the logging
- 2.5 Remove on disk query cache
- 2.6 Restart watchdog process when it abnormaly exits
- 2.7 Synchronize backend nodes information with watchdog when standby pgpool starts up
- 2.8 Avoid multiple pgpools from executing failover.sh simultaneously.
- 2.9 Add new parameter for searching primary node timeout
- 2.10 Allow to load balance even in an explicit transaction in replication mode
- 2.11 Add testing framework
- 2.12 Add switch to control select(2) time out in connecting to PostgreSQL
- 2.13 Allow to specify which node is dead when starting up
- 2.14 Remove parallel query
- 2.15 Enhance pcp commands
- 2.16 Enhance performance of extended protocol case
- 2.17 Import PostgreSQL 9.5's parser
- 2.18 Watchdog feature enhancement
- 2.19 Allow to specify user name, password and database name for health check per backend base
Pgpool-II TODO list
- This will bring less inquiry to the system catalogue (thus better performance) and more real-time cache invalidation.
Allow to use multiple pgpool-II instances with in-memory query cache enabled
- For this purpose we not only use memcached but also we need to store the oid map info on it to share the info among pgpool-II instances.
Allow to use pg_rewind in online recovery
- pg_rewind could speed up online recovery. However it only works when the target node was normally shut down. Can we recognize that?
Do not disconnect to clients when a fail over happens
- At this moment we don't know how to implement it but this is a desirable feature.
- This is being discussed in pgpool-II 3.6 development.
Support peer auth
- Apparently pool_hba.conf should recognize it if we are going to support it. Also pgpool-II should forward it to PostgreSQL. We need think the case if pg_hba.conf does not use peer auth.
Enhance documents
- The current document for is plain HTML, which is a real pain to maintain. Like PostgreSQL, is SGML our direction?
- Pgpool-II 3.6 is going to change the document format to SGML.
Automatically reattach a node in streaming master/slave configuration
- In streaming master/slave configuration there could be an option to automatically reattach a node if it's up-to-date with the master (0 bytes behind). It often happens that due to minor network outage a slave node is dropped off from pgpool and stays down even if the the node has resumed replication with master and is up-to-date.pgpool already knows how much slave is behind master so i guess this wouldn't be too difficult to implement? (from bugtrack #17)
Allow to use client encoding
- It would be nice if pgpool client could use encoding which different from PostgreSQL server encoding.
- To implement this, the parser should be able to handle "unsafe" encodings such as Shift_JIS. psql replaces second byte of each multibyte character to fool the parser. We could hire similar strategy.
Send read query only to standbys even after fail over
- We can configure pgpool-II to not send read queries to the primary. However after a fail over, the role of the node could be changed.
- To solve the problem, we need new flag to specify that read queries always are sent to standbys regardless the fail over ([pgpool-general: 1621] backend weight after failover).
Recognize multi statemnet queries
- As stated in the document, pgpool-II does not recognize multi statement queries correctly (BEGIN;SELECT 1;END). Pgpool-II only parses the first element of the query ("BEGIN" in this case) and decides how to behave.
- Of course this will bring various problems. It would be nice if pgpool-II could understand the each part of the multi statement queries.
- Problem is, how PostgreSQL backend handles the multi statement queries. For example, when client sends BEGIN;SELECT 1;END, backend returns "Command Complete" respectively and "Ready for query" is returned only once. Thus, trying to split multi statement queries to non multi statement queries like what psql is doing will not work.
Cursor statements are not load balanced, sent to all DB nodes in replication mode
- DECLARE..FETCH are sent to all DB nodes in replication mode. This is because the SELECT might come with FOR UPDATE/FOR SHARE.
- It would be nice if pgpool-II checks if the SELECT uses FOR UPDATE/FOR SHARE and if not, enable load balance (or only sends to the master node if load balance is disabled).
- Note that some applications including psql could use CURSOR for SELECT. For example, from PostgreSQL 8.2, if "\set FETCH_COUNT n" is executed, psql unconditionaly uses a curor named "_psql_cursor".
Support IPv6 network
- As of 3.4, it is allowed to use IPv6 address for PostgreSQL backend server and bind address of pgpool-II itself.
- However, PCP process still only binds to IPv4 and UNIX domain socket.
Handle abnormal down of virtual IP interface when watchdog enabled
- When virtual IP interface is dropped abnormally by manual ifconfig etc., there are no one holding VIP, and clients aren't able to connect pgpool-II. Watchdog of active pgpool should monitor the interface or VIP, and handle its down.
Do not invalidate query cache created in a transaction in some cases
- Currently new query cache for table t1 created in a transaction is removed at commit if there's DMLs which touch t1 in the same transaction. Apparently this is overkill for same cases:
BEGIN; INSERT INTO t1 VALUES(1); SELECT * FROM t1; COMMIT;
- To enhance this, we need to teach pgpool-II about "order of SELECTs and DMLs.".
Fix memory leak in pool_config.c
- The module in charge of parsing pgpool.conf has memory leak problem. Usually pgpool reads pgpool.conf just once at the start up time, it is not a big problem. However reloading pgpool.conf will leak memory and definitely a problem. Also using memory leak check tools like valgrind emit lots of error messages and very annoying. So it would be nice to fix the problem in the future.
Add SET commnad
- Pgpool specific SET command would be usefull. For example, using "SET debug = 1" could produce debug info on the fly for particular session.
- This is being discussed in pgpool-II 3.6 development.
Put together a definition of error codes into a single header file
- Currently most error codes used by pool_send_{error,fatal}_message() etc (e.g. "XX000", "XX001", "57000") are hard-coded in different sources. They should be defined as constants in a single header together.
Create separate process for health checking
- To make main process more stable, it would be better to make separate process which is responsible for health checking.
Import PostgreSQL's latch module
- Pgpool already has similar module but PostgreSQL's one seems more sophiscated and reliable.
Allow to use schema qualifications in black_function_list and white_function_list
- Currently schema qualifications are silently ignored in these parameter.
Support multiple UNIX domain socket directories
- PostgreSQL already does this. See pgpool-hackers: 1433.
Health-check timeout for each backend node
- In the current, timeout values specified by health_check_timeout means the total time for checking all the backend status. Hence, if it takes a long time to succeed to check a backend, when timeout occurs during checking the next backend, this node is regarded as failed and failovered even though this is healthy.To resolve this issue, we need health-check timeout for each backend.
TODOs already done
Allow to specify which node is dead when starting up
- If we set longer health check timeout and/or many health check retries, starting up pgpool-II will take long time if some of DB nodes have been down because of health checking and retries in creating connection to backend.
- pgpool_status should help here but for the very first starting up, we cannot use it.
- It would be nice if we could tell pgpool-II about down node info.
- As of 3.4, pgpool_status file is changed to a plain ASCII file and users could specify down node by using ordinary text editors.
Ability to load balance based on Client IP, database, table etc.
- From bugid 26: I have recently moved a database from Mysql to postgresql 9.1.5 which is behind a pgpool-II-3.1.4 . Everything went fine until i observed that some "tickets" are not created correctly by the application (OTRS) that populate the database.
- After some debugging i found/guess that the problem is the following:
- when a cron job wants to create a ticket he has to insert info in abut 10 tables, and i guess that the 2-nd, 3-rd ... inserts depends on the first. The problem was that this operation is not performed transactionally so after the first insert, when the app tries to perform the other inserts, first tries to select "the first insert", but this first insert is still not propagated to all nodes, and the error occurs.
- I`m aware of the fact that if this entire operation would be performed transactionally (only on master) the issue is solved, but unfortunately i cannot modify the app.
- So i want to know if there is any way that i can tell to pgpool something like :
- any request from this ip do not load balance.
- PS. temporary i have set the weight factor to 0 to the 2-nd and 3-rd postgresql slaves and it behaves ok, because reads and writes only from master.
- P.P.S. there's also different request regarding load balance.
- http://www.pgpool.net/pipermail/pgpool-general/2014-June/003032.html
- This item has been implemented in 3.4 as "database_redirect_preference_list" and "app_name_redirect_preference_list".
Import PostgreSQL's execption handling
- PostgreSQL's exception handling (elog family) is pretty good tool to make codes to be simple and robust. It would be nice if pgpool could use this. This has been already done in 3.4.
Allow to print user name in the logging
- This will be useful for audit purpose. (done and will appear in pgpool-II-3.4.0).
Remove on disk query cache
- Old on disk query cache has almost 0 user and has sevior limitation, including no automatic cache invalidation. This has been already obsoleted since on memory query cache implemented. We should remove this (this has been already in git master and will appear in 3.4.0).
Restart watchdog process when it abnormaly exits
- It would be nice for pgpool main to restart watchdog process when it dies abormaly.
Synchronize backend nodes information with watchdog when standby pgpool starts up
- For example, when a certain node is detached from active pgpool and then standby pgpool starts up, the standby pgpool can't recognized that the node is detached. Standby pgpool should get information about node information from other pgpool.
Avoid multiple pgpools from executing failover.sh simultaneously.
- In master-slave mode with watchdog, when a backend DB is down, all pgpools execute failover.sh. It might cause something wrong.
Add new parameter for searching primary node timeout
- pgpool-II uses "recovery_timeout" for searching the primary node timeout after failover. Since this is an abuse of the parameter, we should add new parameter for searching the primary node.
Allow to load balance even in an explicit transaction in replication mode
- Currently load balance in an explicit transaction is only allowed in master-slave mode. It should be allowed in the replication mode as well.
Add testing framework
- PostgreSQL has nice regression test suite. It would be nice if pgpool-II has similar test suite. Problem is, such a suite could be very complex system because it should include not only pgpool-II itself, but also multiple PostgreSQL instances. Also don't forget about "watchdog". Even such a test suite should be able to manage multiple pgpool-II instances.
Add switch to control select(2) time out in connecting to PostgreSQL
- In connect_inet_domain_socket_by_port(), select(2) is issued to watch events on the fd created by non blocking connect(2). The time out parameter of select(2) is fixed to 1 second, which is not long enough in flakey network environment like AWS (http://www.pgpool.net/pipermail/pgpool-general/2014-May/002880.html).
- To solve the problem, new switch to control the time out is desired (done for pgpool-II 3.4.0).
Allow to specify which node is dead when starting up
- If we set longer health check timeout and/or many health check retries, starting up pgpool-II will take long time if some of DB nodes have been down because of health checking and retries in creating connection to backend.
- pgpool_status should help here but for the very first starting up, we cannot use it.
- It would be nice if we could tell pgpool-II about down node info (pgpool-II 3.4.0 chages the pgpool_status format to ASCII. Thus users can edit the file if needed).
Remove parallel query
- Parallel query has severe restrictions such as certain queries cannot be used, nor in extended protocol (i.e. JDBC).
- Also it is pain to upgrade to newer version of PostgreSQL's SQL parser (yes, pgpool-II uses PostgreSQL's parser code). In short, parallel query gives us small gain comparing with the work needed to maintain/enhance. So I would like to obsolete parallel query in the future pgpool-II release. (related parameters have been removed from pgpool.conf in 3.4.0. pgpool-II 3.5.0 will remove actual code).
Enhance pcp commands
- There are number of drawbacks in pcp commands including 1)the timeout parameter is not used any more and should be removed 2)error codes returned from the commands are completely useless 3)multiple commands can not be accepted simultaneously.
- This has been already done in 3.5.
Enhance performance of extended protocol case
- When extended protocl (i.e. JDBC etc.) used, pgpool-II's overhead is pretty large compared with simple query. Need to enhance it.
- This has been already done in 3.5.
Import PostgreSQL 9.5's parser
- No need to say for this.
- This has been already done in 3.5.
Watchdog feature enhancement
- Watchdog is a very important feature of pgpool-II as it is used to eliminate the single point of failure and provide HA. But there are few feature requests and bugs in the existing watchdog that require little more than a simple code fix, and requires the complete revisit of its core architecture.
- See the design proposal for watchdog enhancement [here]
- This has been already done in 3.5.
Allow to specify user name, password and database name for health check per backend base
- In some environment it is not allowed to access standard database i.e. postgres and template1. So users need to specify them per backend basis.
- Maybe we need backend_healthcheck_username0 etc? See http://www.pgpool.net/pipermail/pgpool-hackers/2015-June/000942.html for more details.
- This has been already done in 3.5.