<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jul 18, 2021 at 11:31 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> Hi Ishii_San<br>
> <br>
> Your understanding is correct for the proposal. Basically IMHO whatever we<br>
> do for trying to remedy that original issue there will always be a chance<br>
> of split-brain.<br>
<br>
Right.<br>
<br>
> The reason I am proposing this solution is that with this proposed design<br>
> the behaviour<br>
> would be configurable. For example if user set wd_lost_node_to_remove_timeout<br>
> = 0<br>
> then this will disable the lost node removal function and eventually the<br>
> watchdog would<br>
> behave as it does currently.<br>
> And normally I expect this wd_lost_node_to_remove_timeout value to be set<br>
> in the<br>
> range of 5 to 10 mins. Because blackout for more than 5 to 10 mins would<br>
> mean<br>
> there is some serious problem in the network that a node is unable to<br>
> community for<br>
> such a long period of time and we need resume the service even if it comes<br>
> with<br>
> the risk of a split-brain.<br>
<br>
Ok.<br>
<br>
> The second part of proposal talks about the nodes that are properly shut<br>
> down. In that<br>
> case, the proposal is to stop counting those nodes towards the quorum<br>
> calculation since<br>
> we already know that these nodes are not alive anymore.<br>
<br>
Is it possible to configure watchdog to enable the lost node removal<br>
function only when a node is properly shutdown?<br>
<br></blockquote><div>Yes if we disable the <span style="background-color:transparent">wd_lost_node_to_remove_timeout (by setting it to 0)</span></div><div><span style="background-color:transparent">the lost node removal will only happen for properly shutdown nodes.</span></div><div><span style="background-color:transparent"><br></span></div><div><span style="background-color:transparent">Best Regards</span></div><div><span style="background-color:transparent">Muhammad Usama</span></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> But again it also<br>
> have associated<br>
> risks in case the previously shutdown node got started again but unable to<br>
> communicate<br>
> with existing cluster.<br>
<br>
Fair point.<br>
<br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
</blockquote></div></div>