<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
<div name="messageBodySection">
<div dir="auto">Dear Tatsuo<br />
<br />
I am not an expert on C programming, but if I understand correctly, applying this debugger patch would mean doing it on a fresh copy of the pgpool code, re-compiling it and then running another instance of pgpool from that new code. If that’s the procedure, I am afraid, we might not be able to reproduce the bug in that new instance as it is so infrequent.<br />
<br />
Is there a way we can debug this or get these variables in runtime? I highly doubt it as I already tried increasing the log level to debug5 but couldn’t find anything related but I still want to ask because you might know of better alternatives.<br />
<br />
If not, I will definitely try to run it in a fresh pgpool instance and leave the current buggy one running as it is. Just hoping for a better alternative.</div>
</div>
<div name="messageSignatureSection"><br />
<div class="matchFont" dir="auto">Thank you</div>
<div class="matchFont" dir="auto"><br /></div>
<div class="matchFont">Anirudh</div>
</div>
<div name="messageReplySection">On 21 Feb 2022, 2:20 AM +0100, Tatsuo Ishii <ishii@sraoss.co.jp>, wrote:<br />
<blockquote type="cite" style="border-left-color: grey; border-left-width: thin; border-left-style: solid; margin: 5px 5px;padding-left: 10px;">Hi Anirudh,<br />
<br />
<blockquote type="cite">Hello Tatsuo<br />
<br />
Thank you so much for the explanation. I understand the problem but same as you, cannot understand the reason behind it. The same variable 'BACKEND_INFO(i).backend_status’ cannot have 2 different values. Maybe the VALID_BACKEND macro is false due to other checks that it has in its if else blocks. I am not sure. I will try to have a look.<br /></blockquote>
<br />
Maybe. It would be nice if you could attach debugger to the health check<br />
process which emits the log:<br />
<br />
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">invalid degenerate backend request, node id : 2 status: [2] is not valid for failover<br /></blockquote>
</blockquote>
</blockquote>
<br />
then examin variables which may be related to the issue. One of them I am especially interested in these:<br />
<br />
(1)<br />
static POOL_SESSION_CONTEXT * session_context = NULL;<br />
<br />
The variable in the health check is expected to be NULL. If not, we have trouble.<br />
<br />
<br />
(2)<br />
BACKEND_STATUS private_backend_status[MAX_NUM_BACKENDS];<br />
<br />
This is the backend status copy from the shared memory area.<br />
This is expected to be 1 or 2 (CONN_CONNECT_WAIT or CON_UP).<br />
<br />
BTW, I need to correct my explanation.<br />
<br />
<blockquote type="cite">
<blockquote type="cite">my_backend_status[(backend_id). The local variable is initialized at<br />
the pgpool main process (src/main/pgpool_main.c) and inherited via<br />
fork() to the health check process.<br />
<br />
/*<br />
* Initialize backend status area. From now on, VALID_BACKEND macro can be<br />
* used. (get_next_main_node() uses VALID_BACKEND)<br />
*/<br />
<br />
for (i = 0; i < MAX_NUM_BACKENDS; i++)<br />
{<br />
my_backend_status[i] = &(BACKEND_INFO(i).backend_status);<br />
}<br />
<br />
As you can see actually my_backend_status stores the pointer to<br />
BACKEND_INFO(i).backend_status. So *(my_backend_status[(backend_id)<br />
should be same as BACKEND_INFO(backend_id).backend_status.<br /></blockquote>
</blockquote>
<br />
Actually my_backend_status is not initialized like this. health check process calls this:<br />
/* Initialize my backend status */<br />
pool_initialize_private_backend_status();<br />
<br />
It copies backend status in the shared memory area to the process<br />
local memory. This is necessary for pgpool child process which deals<br />
with client session so that they are not bothered by the status<br />
changes in the shared memory area in the middle of process. BUT this<br />
is not necessary for health check process. I am going to fix this.<br />
<br />
However I don't think this can explain the issue. As long as there's<br />
no difference between the local variable and the shared memory in the<br />
life time, the bug does not do anything bad, and I cannot think of<br />
any scenario which could difference between the variables.<br />
<br />
<blockquote type="cite">As for the reproducibility, you are right, in the past few months of development and testing, I have only seen this issue a couple times. Earlier, I thought it was something that I missed in the config but now that my setup is almost production ready, I opened this ticket for help. It is really difficult to reproduce but I currently have a setup running where this problem is persistent. I have tried by turning that replica off and on several times and it connects back to the cluster when it's up but I see the same error when it’s down.<br />
<br />
So, if you prefer I can help you investigate more on this by sharing some detailed logs and config options from my setup.<br />
<br />
Even though this issue comes up rarely, but this is important for me to resolve because my setup uses synchronous replication and if both of my replicas are down, I convert the replication mode from sync to async in the failover script. But if this bug shows up, then the failover script is not executed when the last replica goes down and my queries hang forever because they do not have a replica committing transactions in synchronous mode.<br />
<br />
So, if I can help you debug this by providing any logs or config or a direct access to my setup over a Zoom call, that’d be really great for me. Please let me know if this works for you.<br /></blockquote>
<br />
Thanks. Is it possible for you to apply attached patches so that<br />
pgpool produces additional log? (this will emit log whenever failover<br />
happens). The log basically emits information which can be obtained by<br />
debugger explained above.<br />
<br />
<blockquote type="cite">Thank you<br />
<br />
Anirudh<br />
On 19 Feb 2022, 2:33 AM +0100, Tatsuo Ishii <ishii@sraoss.co.jp>, wrote:<br />
<blockquote type="cite">
<blockquote type="cite">Hello<br />
<br />
I had the following setup:<br />
1 pgpool: 4.2.2<br />
3 postgres nodes (all 12.5) - 1 primary, 2 replicas<br />
<br />
I turned off both replicas one after the other with a difference of a couple seconds. Failover was performed for one of the replicas but not for the other. For the second replica the healthcheck keeps on failing but it just keeps on performing the healthcheck in an infinite loop. When pgpool attempts the failover of the second replica, I see this error in the logs:<br />
<br />
invalid degenerate backend request, node id : 2 status: [2] is not valid for failover<br />
<br />
As far as I understand, status [2] is up, so the failover should be performed. Here¢s a snippet of the logs:<br /></blockquote>
<br />
You are right. Problem is, two different internal APIs which both extract the status do not agree.<br />
The error messages comes from here:<br />
src/main/pool_internal_comms.c: degenerate_backend_set_ex()<br />
<br />
for (i = 0; i < count; i++)<br />
{<br />
if (node_id_set[i] < 0 || node_id_set[i] >= MAX_NUM_BACKENDS ||<br />
(!VALID_BACKEND(node_id_set[i]) && BACKEND_INFO(node_id_set[i]).quarantine == false))<br />
{<br />
if (node_id_set[i] < 0 || node_id_set[i] >= MAX_NUM_BACKENDS)<br />
ereport(elevel,<br />
(errmsg("invalid degenerate backend request, node id: %d is out of range. node id must be between [0 and %d]"<br />
,node_id_set[i], MAX_NUM_BACKENDS)));<br />
else<br />
ereport(elevel,<br />
(errmsg("invalid degenerate backend request, node id : %d status: [%d] is not valid for failover"<br />
,node_id_set[i], BACKEND_INFO(node_id_set[i]).backend_status)));<br />
<br />
I think VALID_BACKEND(node_id_set[i]) returned false<br />
here. VALID_BACKEND returns false if the backend status in the shared<br />
memory area is neither 1 (waiting for connection) nor 2 (up and<br />
running). However BACKEND_INFO(node_id_set[i]).backend_status says<br />
backend_status is actually 2 (up and running). The strange things is,<br />
both VALID_BACKEND and BACKEND_INFO(node_id_set[i]).backend_status<br />
look into the same shared memory area. Let me explain.<br />
<br />
VALID_BACKEND is a macron to be expanded as:<br />
<br />
#define VALID_BACKEND(backend_id) \<br />
((RAW_MODE && (backend_id) == REAL_MAIN_NODE_ID) || \<br />
(pool_is_node_to_be_sent_in_current_query((backend_id)) && \<br />
((*(my_backend_status[(backend_id)]) == CON_UP) || \<br />
(*(my_backend_status[(backend_id)]) == CON_CONNECT_WAIT))))<br />
<br />
Since you are running pgpool in other than RAW_MODE, the macro first<br />
checks if pool_is_node_to_be_sent_in_current_query((backend_id)<br />
returns true. Here is the portion of the code (defined in<br />
src/context/pool_query_context.c).<br />
<br />
bool<br />
pool_is_node_to_be_sent_in_current_query(int node_id)<br />
{<br />
POOL_SESSION_CONTEXT *sc;<br />
<br />
if (RAW_MODE)<br />
return node_id == REAL_MAIN_NODE_ID;<br />
<br />
As I said earlier, RAW_MODE returns false. So it executes next line:<br />
<br />
sc = pool_get_session_context(true);<br />
if (!sc)<br />
return true;<br />
<br />
pool_get_session_context() returns false and this function returns<br />
true because health check process did not use "sessoion context". The<br />
session context only exists on pgpool child process which deal with<br />
user connections.<br />
<br />
Then the macro checks local variable:<br />
my_backend_status[(backend_id). The local variable is initialized at<br />
the pgpool main process (src/main/pgpool_main.c) and inherited via<br />
fork() to the health check process.<br />
<br />
/*<br />
* Initialize backend status area. From now on, VALID_BACKEND macro can be<br />
* used. (get_next_main_node() uses VALID_BACKEND)<br />
*/<br />
<br />
for (i = 0; i < MAX_NUM_BACKENDS; i++)<br />
{<br />
my_backend_status[i] = &(BACKEND_INFO(i).backend_status);<br />
}<br />
<br />
As you can see actually my_backend_status stores the pointer to<br />
BACKEND_INFO(i).backend_status. So *(my_backend_status[(backend_id)<br />
should be same as BACKEND_INFO(backend_id).backend_status.<br />
<br />
However, as I said earlier, it seems they seem to disagree. At this<br />
point I can't think of any explanation for this.<br />
<br />
How often do you see this problem? It's reliably reproduced?<br />
I was not able to reproduce this so far.<br />
<br />
Best reagards,<br />
--<br />
Tatsuo Ishii<br />
SRA OSS, Inc. Japan<br />
English: http://www.sraoss.co.jp/index_en.php<br />
Japanese:http://www.sraoss.co.jp<br /></blockquote>
</blockquote>
<img src="cid:c44928b6d4a4cacfabd880e8ccd436e4" width="auto" style="max-width:100%" /></blockquote>
</div>
</body>
</html>