[pgpool-general: 7591] Re: [pgpool-hackers: 3921] Re: Promote specific backend to primary in streaming replication

Thu Jun 10 22:05:45 JST 2021

Hi Nathan, Umah,

To be honest I don't like --really-promote either:-). Thanks for the
suggestions.

> Hi Umar,
> 
> Yes - I was just thinking the same thing that “really” is not a good option name here. I think calling the option “really” implies that it doesn’t do what it is supposed to unless you set that flag.
> I think “force” is the same as “really” in this case - it implies that pcp_promote_node doesn’t do what it’s supposed to already.
> Instead, I think we want to say “do this extra thing” as well as promoting in pgpool.

Yes.

> I like “-s”/ “switchover” or “-b”/“backend” (i.e. "promote the backend as well").

So "switchover" implies that pcp_promote exchanges the role of current
primary and specified standby node. I think it's a little bit weak
because if a follow primary command is not set, the primary will not
become a standby: the status of primary will be down and the backend
will remain running as primary. So "exchange" will not happen.

If we employ "switchover", maybe we should stats in the doc something
like: "without setting a follow primary command, pgpool will not
exchange their role"?

> @Ishii-san - I have read your email from the other day, and your subsequent email. I like the approach you are taking here, and very much support this new feature. I think from an end user perspective, having this functionality on the pcp_promote_node command makes a lot more sense than on pcp_detach_node - so I think the “new” way is much better. Thank you very much for your work on this!
> 
>> On 11/06/2021, at 12:03 AM, Umar Hayat <m.umarkiani at gmail.com> wrote:
>> 
>> Hi,
>> Isn't this scenario called switchover when we do a planned switch from primary to standby(or vice versa) and IMO it's an industry's acceptable term used for this scenario.
>> 
>> I would suggest we should name new flag is '-s' '--switchover' 
>> or if we don't want to call it switchover we can call it '-f' '--force' or '-f' '--force-promote' instead of 'really-promote'
>> 
>> Regards
>> Umar Hayat
>> 
>> On Thu, Jun 10, 2021 at 4:28 PM Tatsuo Ishii <ishii at sraoss.co.jp <mailto:ishii at sraoss.co.jp>> wrote:
>> Sorry for a duplicating post. I feel I need to discuss with developers
>> living in pgpool-hackers.
>> 
>> Previously I posted the PoC patch. Now I decided to create a commit-table patch.
>> Differences from the PoC patch are:
>> 
>> - The user interface is now pcp_promote, rather than hacked version of
>>   pcp_detach_node.
>> 
>> - I added new parameter " -r, --really-promote really promote" to
>>   pcp_promote_node command, which is needed to keep the backward
>>   compatibility. If the option is specified, pcp_promote_node will
>>   detach the current primary and kicks failover command, with the main
>>   node id to be set to the node id specified by pcp_promote_command. A
>>   follow primary command will run if it is set.
>> 
>> - To pass the new argument to pcp process, I tweaked number of
>>   internal functions used in the pcp system.
>> 
>> - Doc patch added.
>> 
>> Here is the sample session of the modified pcp_promote_node.
>> 
>> $ pcp_promote_node --help
>> pcp_promote_node - promote a node as new main from pgpool-II
>> Usage:
>> pcp_promote_node [OPTION...] [node-id]
>> Options:
>>   -U, --username=NAME    username for PCP authentication
>>   -h, --host=HOSTNAME    pgpool-II host
>>   -p, --port=PORT        PCP port number
>>   -w, --no-password      never prompt for password
>>   -W, --password         force password prompt (should happen automatically)
>>   -n, --node-id=NODEID   ID of a backend node
>>   -g, --gracefully       promote gracefully(optional)
>>   -r, --really-promote   really promote(optional) <-- added
>>   -d, --debug            enable debug message (optional)
>>   -v, --verbose          output verbose messages
>>   -?, --help             print this help
>> 
>> # start with 4-node cluster. Primary is node 0.
>> 
>> $ psql -p 11000 -c "show pool_nodes" test
>>  node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
>> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>>  0       | /tmp     | 11002 | up     | up        | 0.250000  | primary | primary | 0          | false             | 0                 |                   |                        | 2021-06-10 20:17:41
>>  1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:17:41
>>  2       | /tmp     | 11004 | up     | up        | 0.250000  | standby | standby | 0          | true              | 0                 | streaming         | async                  | 2021-06-10 20:17:41
>>  3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:17:41
>> (4 rows)
>> 
>> # promote node 2.
>> $ pcp_promote_node -p 11001 -w -r 2
>> pcp_promote_node -- Command Successful
>> 
>> [a little time has passed]
>> 
>> # see if node 2 becomes new primary. Note that other standbys are sync
>> # with new primary because follow primary command is set.
>> 
>> psql -p 11000 -c "show pool_nodes" test
>>  node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
>> ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>>  0       | /tmp     | 11002 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:20:31
>>  1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:20:31
>>  2       | /tmp     | 11004 | up     | up        | 0.250000  | primary | primary | 0          | true              | 0                 |                   |                        | 2021-06-10 20:18:20
>>  3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-10 20:20:31
>> (4 rows)
>> 
>> Patch attached.
>> 
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php <http://www.sraoss.co.jp/index_en.php>
>> Japanese:http://www.sraoss.co.jp <http://www.sraoss.co.jp/>
>> 
>> > Hi Nathan,
>> > 
>> > I have revisited this and am leaning toward the idea: modifying
>> > existing pcp command so that we can specify which node to be promoted.
>> > 
>> > My idea is using the detaching node protocol of pcp command. Currently
>> > the protocol specifies the node to be detached. I add a new request
>> > detail flag bit "REQ_DETAIL_PROMOTE". If the flag is set, the protocol
>> > treats the requested node id to be promoted, rather than to be
>> > detached. In this case the node to be detached will be the current
>> > primary node. The node to be promoted is passed to a failover script
>> > as the existing argument "new main node" (the live node (not including
>> > current primary node) which has the smallest node id). Existing
>> > failover scripts usually treat the "new main node" as the node to be
>> > promoted.
>> > 
>> > Attached is the PoC patch for this. After applying the patch, the -g
>> > argument (detach gracefully) of pcp_detach_node specifies the
>> > "REQ_DETAIL_PROMOTE" flag and the node id argument now represents the
>> > node id to be promoted.  For example:
>> > 
>> > # create 4-node cluster.
>> > $ pgpool_setup -n 4
>> > 
>> > # primary node is 0.
>> > $ psql -p 11000 -c "show pool_nodes" test
>> >  node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
>> > ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>> >  0       | /tmp     | 11002 | up     | up        | 0.250000  | primary | primary | 0          | false             | 0                 |                   |                        | 2021-06-07 11:02:25
>> >  1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:02:25
>> >  2       | /tmp     | 11004 | up     | up        | 0.250000  | standby | standby | 0          | true              | 0                 | streaming         | async                  | 2021-06-07 11:02:25
>> >  3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:02:25
>> > (4 rows)
>> > 
>> > # promote node 2.
>> > pcp_detach_node -g -p 11001 -w 2
>> > psql -p 11000 -c "show pool_nodes" test
>> >  node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
>> > ---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>> >  0       | /tmp     | 11002 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:03:17
>> >  1       | /tmp     | 11003 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:03:17
>> >  2       | /tmp     | 11004 | up     | up        | 0.250000  | primary | primary | 0          | true              | 0                 |                   |                        | 2021-06-07 11:02:43
>> >  3       | /tmp     | 11005 | up     | up        | 0.250000  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2021-06-07 11:03:27
>> > (4 rows)
>> > 
>> > (note that the node 3 may become in down status after the execution of
>> > pcp_detach_node but it's because other issue. See [pgpool-hackers:
>> > 3915] ERROR: failed to process PCP request at the moment for more
>> > details.)
>> > 
>> > One of the merits of this method is, existing failover scripts need
>> > not to be changed.  (Of course we could modify existing
>> > pcp_promote_node so that it does the job).
>> > 
>> > However I see difficulties with modifying existing pcp commands
>> > (either pcp_detach_node and pcp_promote_node). The protocol used by
>> > pcp commands are very limited in that the only argument to be able to
>> > use is, the node id. So how pcp_detach_node implements -g flag?
>> > pcp_detach_node uses two kinds of pcp protocol: one is for without -g
>> > and the other is with -g. (I think eventually we should overhaul
>> > existing pcp protocol someday to overcome these limitations but it's
>> > other story).
>> > 
>> > Probably we need to invent new pcp protocol and use it for this
>> > purpose.
>> > 
>> > Comments and suggestions are welcome.
>> > --
>> > Tatsuo Ishii
>> > SRA OSS, Inc. Japan
>> > English: http://www.sraoss.co.jp/index_en.php <http://www.sraoss.co.jp/index_en.php>
>> > Japanese:http://www.sraoss.co.jp <http://www.sraoss.co.jp/>
>> > 
>> >> Hello!
>> >> 
>> >>> On 7/04/2021, at 12:57 AM, Tatsuo Ishii <ishii at sraoss.co.jp <mailto:ishii at sraoss.co.jp>> wrote:
>> >>> 
>> >>> Hi,
>> >>> 
>> >>>> Thanks for your time on this so far!
>> >>>> 
>> >>>> Would it be possible to allow these to be changed at runtime without a config change + reload? Perhaps a more generic pcp command, to allow any “reloadable” configuration item to be changed? I’m not sure how many of these there are, and if any of them require deeper changes. Maybe we need a 3rd “class” of configuration item, to allow it to be changed without changing config.
>> >>> 
>> >>> Not sure. PostgreSQL does not have such a 3rd"class".
>> >> 
>> >> That’s true, though you can modify a special configuration file without editing files directly with ALTER SYSTEM and then reload the configuration - i.e. you can make changes to the running server without editing configuration files directly.
>> >> 
>> >> Pgpool could somewhat mirror postgres functionality here, if there was an include directive, then a reload is required. If we enabled configuration to be added to such a file with pcp, that would be useful, so it could be done from a remote system perhaps?
>> >> 
>> >> Just some thoughts.. these seem like bigger changes, and I’m new here :-)
>> >> 
>> >>>> We (and I am sure many others) use config management tools to keep our config in sync. In particular, we use Puppet. I have a module which I’m hoping to be able to publish soon, in fact.
>> >>>> 
>> >>>> Usually, environments which use config management like this have policies to not modify configuration which is managed.
>> >>>> 
>> >>>> Switching to a different backend would mean we have to manually update that managed configuration (on the pgpool primary, presumably) and reload, and then run pcp_detach_node against the primary backend. In the time between updating the configuration, and detaching the node, our automation might run and undo our changes which would mean we may get an outcome we don’t expect.
>> >>>> An alternative would be to update the configuration in our management system and deploy it - but in many environments making these changes can be quite involved. I don’t think it’s good to require operational changes (i.e. “move the primary function to this backend”) to require configuration updates.
>> >>>> 
>> >>>> A work-around to this would be to have an include directive in the configuration and include a local, non-managed file for these sorts of temporary changes, but pgpool doesn’t have this at the moment I believe?
>> >>> 
>> >>> No, pgpool does not have. However I think it would be a good idea to
>> >>> have "include" feature like PostgreSQL.
>> >> 
>> >> I would certainly use that. My puppet module writes the configuration data twice:
>> >> 1) In to the main pgpool.conf file.
>> >> 2) In to one of two files, either a “reloadable” config, or a “restart required” config - so that I can use puppet notifications for changes to those files to trigger either a reload or a restart.
>> >> It then does some checks to make sure the pgpool.conf is older than the reloadable/restart required config, etc. etc. - it’s quite messy!
>> >> 
>> >> An include directive would make that all a lot easier.. and of course would enable this use case.
>> >> 
>> >> --
>> >> Nathan Ward
>> >> 
>> >> 
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net <mailto:pgpool-hackers at pgpool.net>
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers <http://www.pgpool.net/mailman/listinfo/pgpool-hackers>
>> _______________________________________________
>> pgpool-hackers mailing list
>> pgpool-hackers at pgpool.net <mailto:pgpool-hackers at pgpool.net>
>> http://www.pgpool.net/mailman/listinfo/pgpool-hackers <http://www.pgpool.net/mailman/listinfo/pgpool-hackers>