[pgpool-hackers: 2688] Re: [pgpool-general: 5885] Re: Pgpool-3.7.1 segmentation fault
Tatsuo Ishii
ishii at sraoss.co.jp
Wed Jan 24 13:57:58 JST 2018
Proposed patch attached.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
> Hi Pgpool-II developers,
>
> It is reported that Pgpool-II child process could have a segfault error.
>
> #0 0x000000000042478d in select_load_balancing_node () at
> protocol/child.c:1680
> 1680 char *database = MASTER_CONNECTION(ses->backend)->sp->database;
> (gdb) backtrace
> #0 0x000000000042478d in select_load_balancing_node () at
> protocol/child.c:1680
>
> To reproduce the problem following conditions should be all met:
>
> 1) Streaming replication mode.
>
> 2) fail_over_on_backend_error is off.
>
> 3) ALWAYS_MASTER flags is set to the master (writer) node.
>
> 4) pgpool_status file indicates that the node mentioned in #2 is in
> down status.
>
> What happens here is,
>
> 1) find_primary_node() returns node id 0 without checking the status
> of node 0 since ALWAYS_MASTER is set. It's remembered as the
> primary node id. The node id is stored in Req_info->primary_node_id.
>
> 2) The connection to backend 0 is not created since pgpool_status says
> it's in down status.
>
> 3) upon starting of session, select_load_balancing_node () is called
> and it tries to determine the database name from client's start up
> packet.
>
> 4) Since MASTER_CONNECTION macro points to the PRIMARY_NODE,
> MASTER_CONNECTION(ses->backend) is NULL and it results in a segfault.
>
> The fix I propose is, to change PRIMARY_NODE_ID macro so that it
> returns REAL_MASTER_NODE_ID (that is the youngest node id which is
> alive) if the node id in Req_info->primary_node_id is in down status.
> So we have the "true" primary node id in Req_info->primary_node_id,
> and "fake" primary node id returned by PRIMARY_NODE_ID macro.
>
> I am afraid it's confusing and may have potential bad effect to
> somewhere in Pgpool-II. Note, however, we already let PRIMARY_NODE_ID
> return REAL_MASTER_NODE_ID if find_primary_node() cannot find a
> primary node. So maybe I am too worried... but I don't know.
>
> So I would like hear opinions from Pgpool-II developers.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> From: Tatsuo Ishii <ishii at sraoss.co.jp>
> Subject: [pgpool-general: 5885] Re: Pgpool-3.7.1 segmentation fault
> Date: Wed, 24 Jan 2018 12:00:40 +0900 (JST)
> Message-ID: <20180124.120040.507189908198617602.t-ishii at sraoss.co.jp>
>
>>> Thanks for the quick reply! I realized that I ended up in this state,
>>> because I was using indexed health checks, and the primary's health checks
>>> had been disabled. I've gone back to a single health_check config, to avoid
>>> this issue.
>>
>> Do you have an issue with "indexed health checks"? I thought it was
>> fixed in 3.7.1.
>>
>> I've also added an extra pre-start step, which removes the
>>> pgpool_status file.
>>
>> That might be a solution but I would like to add a guard to Pgpool-II
>> against the segfault. The segfault occurs when conditions below are
>> all met:
>>
>> 1) fail_over_on_backend_error is off.
>> 2) ALWAYS_MASTER flags is set to the master (writer) node.
>>
>> Attached patch implements the guard against the segfault. Developers
>> will start a discussion regarding the patch in pgpool-hackers.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>>> On Tue, Jan 23, 2018 at 5:49 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
>>>
>>>> Hi Philip,
>>>>
>>>> > Hello poolers,
>>>> >
>>>> > I've compiled pgpool-3.7.1 (./configure --with-openssl; libpq.5.9), for
>>>> > Ubuntu 14.04, to connect to RDS Aurora Postgres (9.6.3). When I try to
>>>> > authenticate, pgpool child process segfaults. My config file follows the
>>>> > instructions set forth by the aurora instructions
>>>> > <http://www.pgpool.net/docs/latest/en/html/example-aurora.html>, I
>>>> think?
>>>> > Have I misconfigured something, to cause this segfault?
>>>> >
>>>> > Any guidance would be appreciated!
>>>> >
>>>> > Thanks,
>>>> > Philip
>>>> >
>>>> >
>>>> > $ psql -h localhost -U user staging
>>>> > Password for user user:
>>>> > psql: server closed the connection unexpectedly
>>>> > This probably means the server terminated abnormally
>>>> > before or while processing the request.
>>>>
>>>> It seems your status file (/var/log/pgpool/pool_status) is out of
>>>> date.
>>>>
>>>> > 2018-01-23 19:23:42: pid 19872: DEBUG: creating new connection to
>>>> backend
>>>> > 2018-01-23 19:23:42: pid 19872: DETAIL: skipping backend slot 0 because
>>>> > backend_status = 3
>>>>
>>>> So Pgpool-II failes to create a connection to backend0, which causes
>>>> the segfault later on. Surely Pgpool-II needs to have a guard for the
>>>> situation, but for now you could workaround this by shutting down
>>>> pgpool, removing /var/log/pgpool/pool_status, and restarting pgoool.
>>>>
>>>> Once proper pool_status is created, you don't need to repeat the
>>>> operation above. i.e. skip removing pool_status.
>>>>
>>>> Best regards,
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese:http://www.sraoss.co.jp
>>>>
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: always_master.diff
Type: text/x-patch
Size: 554 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20180124/8f56cf9d/attachment.bin>
More information about the pgpool-hackers
mailing list