[pgpool-hackers: 219] Unexpected behavior of pgpool-II

Asif Naeem anaeem.it at gmail.com
Mon Apr 22 05:10:32 JST 2013


Hi,

There is unexpected behavior in pgpool2 that seems to be a bug. Please find
the details as following i.e.

Phenomenon

> pgpool2 configuration option "follow_master_command" special character %H
> (Hostname of the new master node) is not working as expected
> In master slave mode after failover, 2nd node become new master but got
> unexpected hostname (via %H). Please find attached pgpool.conf for details.


Versions

> OS version: RHEL - 64
> pgpool version: pgpool-II version 3.2.1 (namameboshi)


*Details/Observations* :-

> At Start

>  node_id | hostname | port | status | lb_weight |  role
> ---------+----------+------+--------+-----------+---------
>  0       | dmp02    | 5444 | 2      | 0.333333  | primary
>  1       | dmp03    | 5444 | 2      | 0.333333  | standby
>  2       | dmp04    | 5444 | 2      | 0.333333  | standby
>


> Failover arguments when the user stopped DMP02.

> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:12 JST

FAILED_NODE_ID =0
> FAILED_NODE_NAME=dmp02
> FAILED_NODE_PORT=5444
> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
> NEW_MASTER_ID =1
> OLD_MASTER_ID =0
> NEW_MASTER_NAME =dmp03
> OLD_PRIMARY_ID =0
> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
> 0
>


> Then, these arguments told user that the master node has been changed

> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:13 JST
> FAILED_NODE_ID =0
> FAILED_NODE_NAME=dmp02
> FAILED_NODE_PORT=5444
> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
> NEW_MASTER_ID =1
> OLD_MASTER_ID =0
> NEW_MASTER_NAME =dmp03
> OLD_PRIMARY_ID =0
> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
> 2013îN 2åé 8ì˙ ã‡ójì˙ 12:44:17 JST
> FAILED_NODE_ID =2
> FAILED_NODE_NAME=dmp04
> FAILED_NODE_PORT=5444
> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
> NEW_MASTER_ID =1
> OLD_MASTER_ID =0
> NEW_MASTER_NAME =dmp02  *<--- *** The master name is unexpected ****
> OLD_PRIMARY_ID =0
> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
>


>  node_id | hostname | port | status | lb_weight |  role
> ---------+----------+------+--------+-----------+---------
>  0       | dmp02    | 5444 | 2      | 0.333333  | standby
>  1       | dmp03    | 5444 | 2      | 0.333333  | primary
>  2       | dmp04    | 5444 | 2      | 0.333333  | standby
>


>   pid  | usesysid |   usename    | application_name |  client_addr  |
> client_hostname | client_port |    backend_start                 |   state
>   | sent_location | write_location | flush_location | replay_location |
> sync_priority | sync_state
>
> -------+----------+--------------+------------------+---------------+-----------------+-------------+----------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
>  20641 |       10 | enterprisedb | dmp02            | 172.24.217.23 |
>             |       16948 | 08-FEB-13 17:39:44.078173 +09:00 | streaming |
> 14/C50000E0   | 14/C50000E0    | 14/C50000E0    | 14/C50000E0     |
>     0 | async
> (1 çs)
>

There is one pg_stat_replication record because dmp04 is connecting to dmp02

This phenomenon is reproduced by user. What we need to do is just make the
scripts like follow_master.sh and launch pgpool process with -F [path to
pcp.conf].

1. Before failover

> -bash-3.2$ psql -h 10.1.1.187 -p 9999 -c "show pool_nodes"
> node_id | hostname | port | status | lb_weight | role
> ---------+----------+------+--------+-----------+---------
> 0 | slave1 | 5444 | 2 | 0.333333 | primary
> 1 | slave2 | 5444 | 2 | 0.333333 | standby
> 2 | slave3 | 5444 | 2 | 0.333333 | standby
> (3 行)


2. We stopped slave 1 node with "pg_ctl -D /data -m immediate stop".

3. After failover

> -bash-3.2$ psql -h 10.1.1.187 -p 9999 -c "show pool_nodes"
> node_id | hostname | port | status | lb_weight | role
> ---------+----------+------+--------+-----------+---------
> 0 | slave1 | 5444 | 2 | 0.333333 | standby
> 1 | slave2 | 5444 | 2 | 0.333333 | primary
> 2 | slave3 | 5444 | 2 | 0.333333 | standby
> (3 行)


4. Then user got output like below after failover that is not expected that
is present at the end of follow_master.sh.log i.e.

> NEW_MASTER_NAME =slave1


[root at master1 ~]# cat /tmp/failover.sh.log

> 2013年 3月 13日 水曜日 11:36:28 JST
> FAILED_NODE_ID =0
> FAILED_NODE_NAME=slave1
> FAILED_NODE_PORT=5444
> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
> NEW_MASTER_ID =1
> OLD_MASTER_ID =0
> NEW_MASTER_NAME =slave2
> OLD_PRIMARY_ID =0
> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
> 0
>
[root at master1 ~]# cat /tmp/follow_master.sh.log

> 2013年 3月 13日 水曜日 11:36:30 JST
> FAILED_NODE_ID =0
> FAILED_NODE_NAME=slave1
> FAILED_NODE_PORT=5444
> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
> NEW_MASTER_ID =1
> OLD_MASTER_ID =0
> NEW_MASTER_NAME =slave2
> OLD_PRIMARY_ID =0
> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data
> 2013年 3月 13日 水曜日 11:36:32 JST
> FAILED_NODE_ID =2
> FAILED_NODE_NAME=slave3
> FAILED_NODE_PORT=5444
> FAILED_NODE_DATA=/opt/PostgresPlus/9.2AS/data
> NEW_MASTER_ID =1
> OLD_MASTER_ID =0
> NEW_MASTER_NAME =*slave1
> *OLD_PRIMARY_ID =0
> NEW_MASTER_DATA =/opt/PostgresPlus/9.2AS/data


Please guide. Thanks.

Best Regards,
Asif Naeem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20130422/d447bd42/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgpool.zip
Type: application/zip
Size: 7111 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20130422/d447bd42/attachment.zip>


More information about the pgpool-hackers mailing list