[pgpool-general: 8951] Re: pgpool 4.4.4: reading status file: 1 th backend is set to down status
Camarena Daniel
Daniel.Camarena at azo.com
Tue Oct 17 23:21:23 JST 2023
Hi Tatsuo,
to be honest I just have chosen the Bitnami solution because they offered docker containers and an up to date example to configure the dockerized cluster. As I mentioned, I’m just the user in the middle. I agree that Bitnami should take care of that behaviour and I will let them know.
Now knowing pgpool-II and it's capabilities better I'll give it a try to set up a postgres cluster with just pgpool-II. I'm using docker-compose for orchestration. Can I use anyway the Pgpool-II Docker images for Kubernetes<https://hub.docker.com/r/pgpool/pgpool> or do I need to create my own docker image?
Thanks, regards, Daniel
AZO GmbH & Co. KG
Rosenberger Str. 28
D-74706
Osterburken
Tel.: +49 6291 92-6449
Mob.: +49 162 9919448
Fax: +49 6291 9290449
Mail: Daniel.Camarena at azo.com
Web: www.azo.com
AZO. We Love Ingredients.
KG: Sitz Osterburken, Register-Gericht Mannheim HRA 450086, Persönlich haftende Gesellschafterin: AZO Beteiligungs GmbH, Sitz Osterburken, Register-Gericht Mannheim HRB 450261
Geschäftsführer: Rainer Zimmermann | Daniel Auerhammer | Dr. Matthias Fechner | Jan-Wilko Helms | Dennis Künkel
Diese E-Mail einschließlich ihrer Anhänge ist vertraulich. Wir bitten Sie, eine fehlgeleitete E-Mail zu löschen und uns eine Nachricht zukommen zu lassen. Wir haben die E-Mail vor dem Versenden auf Virenfreiheit geprüft. Eine Haftung für Virenfreiheit schließen wir jedoch aus.
This e-mail and its attachments are confidential. If you are not the intended recipient of this e-mail message, please delete it and inform us accordingly. This e-mail was checked for viruses when sent, however we are not liable for any virus contamination.
-----Ursprüngliche Nachricht-----
Von: Tatsuo Ishii <ishii at sraoss.co.jp>
Gesendet: Dienstag, 17. Oktober 2023 11:27
An: Camarena Daniel <Daniel.Camarena at azo.com>
Cc: pgpool-general at pgpool.net
Betreff: Re: AW: AW: [pgpool-general: 8942] pgpool 4.4.4: reading status file: 1 th backend is set to down status
> Hi Tatsuo,
>
>
>
> thank you so much for the detailed explanation. That makes all sense to me - one detail in the combination of acting software is not matching from my point of view: Why is pgpool trying to prevent a split brain scenario? I'll explain my thoughts and maybe you can point out at which point pgpool is misconfigured or the misunderstanding.
>
>
>
> As I mentioned, I'm using containers packed by Bitnami, which are using pgpool as a proxy for the clients to find always primary instance<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fr%2Fbitnami%2Fpgpool&data=05%7C01%7CDaniel.Camarena%40azo.com%7C29016a2d83b74e85391a08dbcef32a5e%7Ce3767d377c424eb68b3f60c7e9578e74%7C0%7C0%7C638331316022473173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Fx7RAIh6Gzl5tNSxgq6xo4NdlP1QQnbSl2ApHqHzCzU%3D&reserved=0<https://hub.docker.com/r/bitnami/pgpool>> and repmgr and pg to provide a HA solution for Postgres<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fr%2Fbitnami%2Fpostgresql-repmgr&data=05%7C01%7CDaniel.Camarena%40azo.com%7C29016a2d83b74e85391a08dbcef32a5e%7Ce3767d377c424eb68b3f60c7e9578e74%7C0%7C0%7C638331316022473173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=c3dYG1%2FxP6x2IRI9Obd6cwSWmc6U3LSwkJb8J6MxEhw%3D&reserved=0<https://hub.docker.com/r/bitnami/postgresql-repmgr>>.
> What is Pgpool-II?
> Pgpool-II is the PostgreSQL proxy. It stands between PostgreSQL servers and their clients providing connection pooling, load balancing, automated failover, and replication.
>
>
>
>
>
> What is PostgreSQL HA?
>
> This PostgreSQL cluster solution includes the PostgreSQL replication manager, an open-source tool for managing replication and failover on PostgreSQL clusters.
>
>
>
> As far as I understand, Bitnami designed it like this, that Postgres in combination with repmgr<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.repmgr.org%2F&data=05%7C01%7CDaniel.Camarena%40azo.com%7C29016a2d83b74e85391a08dbcef32a5e%7Ce3767d377c424eb68b3f60c7e9578e74%7C0%7C0%7C638331316022473173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Fzg9jYRLMrxFu5nPhQSZYBEaUmTqeoS8r4K56Sk5WmU%3D&reserved=0<https://www.repmgr.org/>> should do all the High Availability work. Pgpool should "just" forward connections from the clients to the primary instance of postgres.
Pgpool-II has evolved as a cluster software. Proxy is just a one of features of Pgpool-II.
> In the error case I'm describing here, just pgpool has a problem with the situation. The cluster itself is working. The connection to every pg and pgpool is working. It is just that one pgpool remains unhealthy because it can't determine a primary because it has marked the server, which is indeed primary, as down.
Anybody could combine Pgpoool-II with other software but then it's his/her responsibility to understand the functionality/interface of Pgpool-II and make the whole system work, not Pgpool-II's resposibility. To be honest, I don't understand why you need repmgr in the first place because Pgpool-II has already the HA feature too.
>>From my point of view (and I'm just the user in the middle of the software giants postgres, repmgr, pgpool and Bitnami), I think there is a gap in configuration of pgpool to just act as proxy. Nothing to do during failover nor nothing extra. Therefore I decided to start with my investigation here at the pgpool mailing list. As far as I understand it, if pgpool would ignore/disable the pgpool_status file, my scenario would be fine. Maybe there is another configuration which does exactly what Bitnami expected from pgpool to do. I think you are the very right person who could know this. Therefore the question: What do I need to configure that pgpool is just acting as a proxy (which was the intention of Bitnami)? OR How can I disable the pgpool_status file (I already was thinking to delete it in the entry point of the container)?
You can ignore the pgpool_status file by starting Pgpool-II with -D option. See the manual for more details. Note, however, In this case, the split brain problem should be prevented by other software.
> Thank you so much, I appreciate your expertise,
You are welcome!
> best regards, Daniel
>
>
>
>
> AZO GmbH & Co. KG
> Rosenberger Str. 28
> D-74706
> Osterburken
> Tel.: +49 6291 92-6449
> Mob.: +49 162 9919448
> Fax: +49 6291 9290449
> Mail: Daniel.Camarena at azo.com<mailto:Daniel.Camarena at azo.com>
> Web:
> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.a<http://www.a/>
> zo.com%2F&data=05%7C01%7CDaniel.Camarena%40azo.com%7C29016a2d83b74e853
> 91a08dbcef32a5e%7Ce3767d377c424eb68b3f60c7e9578e74%7C0%7C0%7C638331316
> 022473173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bTNCikSgmnx43qrFd
> dP25Ouf4t7Hau%2BK2eJ2zdfFRlc%3D&reserved=0
> AZO. We Love Ingredients.
> KG: Sitz Osterburken, Register-Gericht Mannheim HRA 450086, Persönlich
> haftende Gesellschafterin: AZO Beteiligungs GmbH, Sitz Osterburken,
> Register-Gericht Mannheim HRB 450261
> Geschäftsführer: Rainer Zimmermann | Daniel Auerhammer | Dr. Matthias
> Fechner | Jan-Wilko Helms | Dennis Künkel
>
> Diese E-Mail einschließlich ihrer Anhänge ist vertraulich. Wir bitten Sie, eine fehlgeleitete E-Mail zu löschen und uns eine Nachricht zukommen zu lassen. Wir haben die E-Mail vor dem Versenden auf Virenfreiheit geprüft. Eine Haftung für Virenfreiheit schließen wir jedoch aus.
>
> This e-mail and its attachments are confidential. If you are not the intended recipient of this e-mail message, please delete it and inform us accordingly. This e-mail was checked for viruses when sent, however we are not liable for any virus contamination.
> -----Ursprüngliche Nachricht-----
> Von: Tatsuo Ishii <ishii at sraoss.co.jp<mailto:ishii at sraoss.co.jp>>
> Gesendet: Dienstag, 17. Oktober 2023 02:49
> An: Camarena Daniel <Daniel.Camarena at azo.com<mailto:Daniel.Camarena at azo.com>>
> Cc: pgpool-general at pgpool.net<mailto:pgpool-general at pgpool.net>
> Betreff: Re: AW: [pgpool-general: 8942] pgpool 4.4.4: reading status
> file: 1 th backend is set to down status
>
>
>
> [Sie erhalten nicht h?ufig E-Mails von
> ishii at sraoss.co.jp<mailto:ishii at sraoss.co.jp<mailto:ishii at sraoss.co.jp%3cmailto:ishii at sraoss.co.jp>>. Weitere Informationen,
> warum dies wichtig ist, finden Sie unter
> https://aka.ms/LearnAboutSenderIdentification ]
>
>
>
>> Hi Tatsuo,
>
>>
>
>>
>
>>
>
>> thanks for your reply and the explication.
>
>>
>
>>
>
>>
>
>> To comment your answers:
>
>>
>
>>> > 1. Is there a file which buffers pg states?
>
>>
>
>>> If you mean "pg_status" column in show pool_nodes command, no. It is obtained from PostgreSQL on the fly when show pool_nodes command gets executed.
>
>>
>
>> Yes. But it seems that out of pg_status is formed a resulting state which is in the column status of show pool_nodes (see results below) and this indicates that the service is down - and pgpool is acting like this. See below the log of pgpool: It indicates, that is marking 0 th node as down because of the "status file".
>
>
>
> I think you are talking about "status" column, rather than "pg_status"
>
> column in show pool_nodes command. The "status" column is set to down by pgpool. The "status" is set according to the content of pgpool_status file upon starting up of pgpool. If it's set to "down", it is not set to "up" until pcp_attach_node or pcp_recovery_node is executeds. The reason is explained below.
>
>
>
>>> > 2. How did the system get into this state?
>
>>
>
>>> I am not familiar with bitnami pgpool nor repmgr. So what I can do
>
>>> is answer from the point of pgpool view. It was caused by either
>
>>> failover triggered by health check (pgpool detects error / shutdown
>
>>> of PostgreSQL), or pcp_detach_node gets executed. I cannot tell
>
>>> either unless looking into pgpool log and pgpool.conf
>
>>
>
>> Pg0 had tons of these messages:
>
>>
>
>> 2023-10-11 11:19:03.522 GMT [956538] FATAL: remaining connection
>
>> slots are reserved for non-replication superuser connections
>
>>
>
>> 2023-10-11 11:19:03.525 GMT [956537] FATAL: remaining connection
>
>> slots are reserved for non-replication superuser connections
>
>>
>
>> 2023-10-11 11:19:03.542 GMT [956539] FATAL: remaining connection
>
>> slots are reserved for non-replication superuser connections
>
>>
>
>> 2023-10-11 11:19:03.545 GMT [956540] FATAL: remaining connection
>
>> slots are reserved for non-replication superuser connections
>
>>
>
>> Pg1 has right now, as I was examining the system the same messages. Sometimes they appear and it seems that because of the a failover occurs - like you described before.
>
>>
>
>> Should I just increase max_connections, default 100, to 200 to prevent the problem?
>
>
>
> Yes, I think so. The above FATAL error could cause the health_check to trigger failover depending on the setting of pgpool.conf.
>
>
>
>> In the meanwhile I have found a file in the logs folder of pgpool. It has the following content:
>
>>
>
>> root at c8bdc87693d4:/opt/bitnami/pgpool/logs# cat pgpool_status
>
>>
>
>> down
>
>>
>
>> up
>
>>
>
>> up
>
>>
>
>>
>
>>
>
>> As pgpool has a line during startup
>
>>
>
>> 2023-10-16 05:28:21.670: main pid 1: LOG: reading status file:
>
>> 0 th backend is set to down status
>
>>
>
>> I thought this file is read and the status of pg0 is overridden by this.
>
>
>
> Yes, your guess is correct. This is necessary to prevent "split brain"
>
> problem. Suppose you have PostgreSQL node 0 (primary) and node 1 (standby).
>
>
>
> 1) node 0 goes down by admin.
>
> 2) node 1 becomes new primary node.
>
> 3) whole system (pgpool, node 0 and node 1) restarts.
>
> 4) node 0 starts as primary, node 1 starts as primary.
>
> 5) now you have two primary nodes, this is the split brain.
>
>
>
> The pgpool_status file prevents this situation. In #3, pgpool sets the node 0 status to down by reading pgpool_status file. Therefore it prevents the situation #5.
>
>
>
>> show pool_nodes; returns the following:
>
>>
>
>> node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
>
>>
>
>>
>
>> ---------+-----------+------+--------+-----------+-----------+-------
>> ---------+-----------+------+--------+-----------+-----------+-
>
>> -+---------+------------+-------------------+-------------------+----
>> -+---------+------------+-------------------+-------------------+-
>
>> --------------+------------------------+---------------------
>
>>
>
>> 0 | 10.0.10.7 | 5432 | down | up | 0.333333 | standby | primary | 0 | false | 0 | | | 2023-10-16 05:29:17
>
>>
>
>> 1 | 10.0.10.8 | 5432 | up | up | 0.333333 | standby | standby | 5 | false | 0 | | | 2023-10-16 05:29:17
>
>>
>
>> 2 | 10.0.10.9 | 5432 | up | up | 0.333333 | standby | standby | 11 | true | 0 | | | 2023-10-16 05:29:17
>
>>
>
>> (3 rows)
>
>>
>
>> Indicates, that pg_role of pg0 is primary but the resulting role is standby, as resulting status is down, even if pg_status is up.
>
>
>
> That's an exepcted behavior. Since pgpool recognizes node 0 as "down"
>
> by reading pgpool_status file, it does not check node 0 to see whether node 0 is primary or not. Pgpool only checks node 1 and node 2, and concluded that there's no primary.
>
>
>
> I think you can use pcp_attach_node on node 0.
>
>
>
>> As orchestration always starts pgpool new, I post the startup sequence of the container:
>
>
>
> [snip]
>
>
>
>> 2023-10-16 05:28:21.757: main pid 1: LOG:
>
>> find_primary_node_repeatedly: waiting for finding a primary node
>
>>
>
>> 2023-10-16 05:28:21.793: main pid 1: LOG: find_primary_node: standby
>
>> node is 1
>
>>
>
>> 2023-10-16 05:28:21.793: main pid 1: LOG: find_primary_node: standby
>
>> node is 2
>
>
>
> Here, pgpool was looked for primary node only against node 1 and 2 by the reason above.
>
>
>
>> Last but not least pgpool.conf you requested. I left the comments in the file:
>
>
>
> Thanks. I noticed that "failover_command = 'echo...". This means when primary godes down, pgpool does not elect new primary. I am not sure if this is correct idea. Maybe the orchestration tool does something which is out of scope of pgpool?
>
>
>
> Best reagards,
>
> --
>
> Tatsuo Ishii
>
> SRA OSS LLC
>
> English:
> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.s<http://www.s/>
> raoss.co.jp%2Findex_en%2F&data=05%7C01%7CDaniel.Camarena%40azo.com%7C2
> 9016a2d83b74e85391a08dbcef32a5e%7Ce3767d377c424eb68b3f60c7e9578e74%7C0
> %7C0%7C638331316022473173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X
> upG%2Bw3dxYY5W1Cw60L5I%2F4m21QVs5SJBpre3Im87eM%3D&reserved=0<https://e
> ur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sraoss.co.
> jp%2Findex_en%2F&data=05%7C01%7CDaniel.Camarena%40azo.com%7C29016a2d83
> b74e85391a08dbcef32a5e%7Ce3767d377c424eb68b3f60c7e9578e74%7C0%7C0%7C63
> 8331316022473173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV
> 2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XupG%2Bw3d
> xYY5W1Cw60L5I%2F4m21QVs5SJBpre3Im87eM%3D&reserved=0>
>
> Japanese:https://eur02.safelinks.protection.outlook.com/?url=http%3A%2
> F%2Fwww.sraoss.co.jp%2F&data=05%7C01%7CDaniel.Camarena%40azo.com%7C290
> 16a2d83b74e85391a08dbcef32a5e%7Ce3767d377c424eb68b3f60c7e9578e74%7C0%7
> C0%7C638331316022473173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLC
> JQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=srA
> v%2BB%2FaMjB%2BTv34upvrEcYcPy4u0%2FcjnAuRaWyoSkc%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image725880.png
Type: image/png
Size: 760 bytes
Desc: image725880.png
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image529676.png
Type: image/png
Size: 1048 bytes
Desc: image529676.png
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image167907.png
Type: image/png
Size: 1321 bytes
Desc: image167907.png
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image985773.png
Type: image/png
Size: 1362 bytes
Desc: image985773.png
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image733435.png
Type: image/png
Size: 795 bytes
Desc: image733435.png
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image012405.png
Type: image/png
Size: 1755 bytes
Desc: image012405.png
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image853650.png
Type: image/png
Size: 39998 bytes
Desc: image853650.png
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image853100.png
Type: image/png
Size: 22075 bytes
Desc: image853100.png
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20231017/534442f6/attachment-0007.png>
More information about the pgpool-general
mailing list