Thread: Patroni vs pgpool II

Patroni vs pgpool II

From

Inzamam Shafiq

Date:

03 April 2023, 06:33:46

Hi Guys,

Hope you are doing well.

Can someone please suggest what is one (Patroni vs PGPool II) is best for achieving HA/Auto failover, Load balancing for DB servers. Along with this, can you please share the company/client names using these tools for large PG databases?

Thanks.

Regards,

Inzamam Shafiq

Re: Patroni vs pgpool II

From

Ron

Date:

03 April 2023, 13:13:59

On 4/3/23 01:33, Inzamam Shafiq wrote:

P {margin-top:0;margin-bottom:0;}
Hi Guys,

Hope you are doing well.

Can someone please suggest what is one (Patroni vs PGPool II) is best for achieving HA/Auto failover, Load balancing for DB servers. Along with this, can you please share the company/client names using these tools for large PG databases?

We're satisfied with PgPool for HA. Can't give the same, and it's only a few hundred GB, though.

--
Born in Arizona, moved to Babylonia.

Re: Patroni vs pgpool II

From

Jehan-Guillaume de Rorthais

Date:

04 April 2023, 12:25:58

On Mon, 3 Apr 2023 06:33:46 +0000
Inzamam Shafiq <inzamam.shafiq@hotmail.com> wrote:
[...]
> Can someone please suggest what is one (Patroni vs PGPool II) is best for
> achieving HA/Auto failover, Load balancing for DB servers. Along with this,
> can you please share the company/client names using these tools for large PG
> databases?

Load balancing is best achieved from the application side.

The most popular auto failover solution is Patroni.

Other solutions are involving Pacemaker to either:

* build a shared storage cluster with a standalone instance moving from node to
  node (but this can include standbys)
* build a cluster with a promotable resource using eg. the PAF resource agent,
  that will decide where to start the standbys and which one to promote.

No matter the solution you pick, be prepared to learn and train. A lot.

Re: Patroni vs pgpool II

From

cen

Date:

04 April 2023, 13:40:44

Can someone please suggest what is one (Patroni vs PGPool II) is best for achieving HA/Auto failover, Load balancing for DB servers. Along with this, can you please share the company/client names using these tools for large PG databases?

Having used pgpool in multiple production deployments I swore to never use it again, ever.

The first reason is that you need a doctorate degree to try to understand how it actually works, what the pcp commands do in each scenario and how to correctly write the failover scripts.

It is basically a daemon glued together with scripts for which you are entirely responsible for. Any small mistake in failover scripts and cluster enters a broken state.

Even once you have it set up as it should, yes, it will fail over correctly but it won't autoheal without manual intervention.

You also often end up in weird situation when backends are up, pgpool reports down and similar scenarios and then you need to run the precise sequence of pcp commands to recover

or destroy your whole cluster in the process if you mistype.

I haven't used patroni yet but it surely can't be worse.

Best regards, cen

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

04 April 2023, 23:00:58

Hi,

> Hi Guys,
> 
> Hope you are doing well.
> 
> Can someone please suggest what is one (Patroni vs PGPool II) is best for achieving HA/Auto failover, Load balancing
forDB servers.
 

I am not sure if Patroni provides load balancing feature.

> Along with this, can you please share the company/client names using these tools for large PG databases?

I can't give you names but we (SRA OSS) have many customers using
PostgreSQL and some of them are using Pgpool-II.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Alexander Kukushkin

Date:

05 April 2023, 05:31:16

Hi,

On Wed, 5 Apr 2023 at 01:01, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Hi,

I am not sure if Patroni provides load balancing feature.

It depends on understanding of load-balancing:

- If we talk about load balancing read-only traffic across multiple replicas - it is very easy to achieve with Patroni.

- If we talk about parsing all queries in order to figure out whether they are read-write or read-only, then no.

BUT, even if there is a solution that parses queries to make a decision it I would not recommend anyone to use it unless all consequences are understood.

Specifically, not every read-only query could be salefy sent to a replica, because they could be lagging behind the primary.

Only application (developers) could decide whether for a specific query they could afford slightly outdated results. Most of the popular application frameworks support configuring two connection strings for this purpose.

Regards,

Alexander Kukushkin

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

05 April 2023, 07:38:26

> BUT, even if there is a solution that parses queries to make a decision it
> I would not recommend anyone to use it unless all consequences are
> understood.
> Specifically, not every read-only query could be salefy sent to a replica,
> because they could be lagging behind the primary.
> Only application (developers) could decide whether for a specific query
> they could afford slightly outdated results. Most of the popular
> application frameworks support configuring two connection strings for this
> purpose.

I think Pgpool-II users well understand the effect of replication
lagging because I've never heard complains like "hey, why my query
result is sometimes outdated?"

Moreover Pgpool-II provides many load balancing features depending on
user's needs. For example users can:

- just turn off load balancing
- turn off load balancing only for specific application name
- turn off load balancing only for specific database
- turn off load balancing if current transaction includes write query

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Inzamam Shafiq

Date:

05 April 2023, 07:41:18

But, I heard PgPool is still affected by Split brain syndrome.

Regards,

Inzamam Shafiq

Sr. DBA

From: Tatsuo Ishii <ishii@sraoss.co.jp>
Sent: Wednesday, April 5, 2023 12:38 PM
To: cyberdemn@gmail.com <cyberdemn@gmail.com>
Cc: inzamam.shafiq@hotmail.com <inzamam.shafiq@hotmail.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Patroni vs pgpool II

> BUT, even if there is a solution that parses queries to make a decision it
> I would not recommend anyone to use it unless all consequences are
> understood.
> Specifically, not every read-only query could be salefy sent to a replica,
> because they could be lagging behind the primary.
> Only application (developers) could decide whether for a specific query
> they could afford slightly outdated results. Most of the popular
> application frameworks support configuring two connection strings for this
> purpose.

I think Pgpool-II users well understand the effect of replication
lagging because I've never heard complains like "hey, why my query
result is sometimes outdated?"

Moreover Pgpool-II provides many load balancing features depending on
user's needs. For example users can:

- just turn off load balancing
- turn off load balancing only for specific application name
- turn off load balancing only for specific database
- turn off load balancing if current transaction includes write query

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

05 April 2023, 07:50:15

> But, I heard PgPool is still affected by Split brain syndrome.

Can you elaborate more? If more than 3 pgpool watchdog nodes (the
number of nodes must be odd) are configured, a split brain can be
avoided.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Regards,
> 
> Inzamam Shafiq
> Sr. DBA
> ________________________________
> From: Tatsuo Ishii <ishii@sraoss.co.jp>
> Sent: Wednesday, April 5, 2023 12:38 PM
> To: cyberdemn@gmail.com <cyberdemn@gmail.com>
> Cc: inzamam.shafiq@hotmail.com <inzamam.shafiq@hotmail.com>; pgsql-general@lists.postgresql.org
<pgsql-general@lists.postgresql.org>
> Subject: Re: Patroni vs pgpool II
> 
>> BUT, even if there is a solution that parses queries to make a decision it
>> I would not recommend anyone to use it unless all consequences are
>> understood.
>> Specifically, not every read-only query could be salefy sent to a replica,
>> because they could be lagging behind the primary.
>> Only application (developers) could decide whether for a specific query
>> they could afford slightly outdated results. Most of the popular
>> application frameworks support configuring two connection strings for this
>> purpose.
> 
> I think Pgpool-II users well understand the effect of replication
> lagging because I've never heard complains like "hey, why my query
> result is sometimes outdated?"
> 
> Moreover Pgpool-II provides many load balancing features depending on
> user's needs. For example users can:
> 
> - just turn off load balancing
> - turn off load balancing only for specific application name
> - turn off load balancing only for specific database
> - turn off load balancing if current transaction includes write query
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Jehan-Guillaume de Rorthais

Date:

06 April 2023, 15:41:58

On Wed, 05 Apr 2023 16:50:15 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> > But, I heard PgPool is still affected by Split brain syndrome.  
> 
> Can you elaborate more? If more than 3 pgpool watchdog nodes (the
> number of nodes must be odd) are configured, a split brain can be
> avoided.

Split brain is a hard situation to avoid. I suppose OP is talking about
PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would
avoid that.

To avoid split brain, you need to implement a combinaison of quorum and
(self-)fencing.

Patroni quorum is in the DCS's hands. Patroni's self-fencing can be achieved
with the (hardware) watchdog. You can also implement node fencing through the
"pre_promote" script to fence the old primary node before promoting the new one.

If you need HA with a high level of anti-split-brain security, you'll not be
able to avoid some sort of fencing, no matter what.

Good luck.

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

07 April 2023, 04:16:59

>> > But, I heard PgPool is still affected by Split brain syndrome.  
>> 
>> Can you elaborate more? If more than 3 pgpool watchdog nodes (the
>> number of nodes must be odd) are configured, a split brain can be
>> avoided.
> 
> Split brain is a hard situation to avoid. I suppose OP is talking about
> PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would
> avoid that.

Ok, "split brain" means here that there are two or more PostgreSQL
primary serves exist.

Pgpool-II's watchdog has a feature called "quorum failover" to avoid
the situation. To make this work, you need to configure 3 or more
Pgpool-II nodes. Suppose they are w0, w1 and w2. Also suppose there
are two PostgreSQL servers pg0 (primary) and pg1 (standby). The goal
is to avoid that both pg0 and pg1 become primary servers.

Pgpool-II periodically monitors PostgreSQL healthiness by checking
whether it can reach to the PostgreSQL servers. Suppose w0 and w1
detect that pg0 is healthy but pg1 is not, while w2 thinks oppositely,
i.e. pg0 is unhealthy but pg1 is healthy (this could happen if w0, w1,
pg0 are in a network A, but w2 and pg1 in different network B. A and B
cannot reach each other).

In this situation if w2 promotes pg1 because w0 seems to be down, then
the system ends up with two primary servers: split brain.

With quorum failover is enabled, w0, w1, and w2 communicate each other
to vote who is correct (if it cannot communicate, it regards other
watchdog is down). In the case above w0 and w1 are majority and will
win. Thus w0 and w1 just detach pg1 and keep on using pg0 as the
primary. On the other hand, since wg2 looses, and it gives up
promoting pg1, thus the split brain is avoided.

Note that in the configuration above, clients access the cluster via
VIP. VIP is always controlled by majority watchdog, clients will not
access pg1 because it is set to down status by w0 and w1.

> To avoid split brain, you need to implement a combinaison of quorum and
> (self-)fencing.
> 
> Patroni quorum is in the DCS's hands. Patroni's self-fencing can be achieved
> with the (hardware) watchdog. You can also implement node fencing through the
> "pre_promote" script to fence the old primary node before promoting the new one.
> 
> If you need HA with a high level of anti-split-brain security, you'll not be
> able to avoid some sort of fencing, no matter what.
> 
> Good luck.

Well, if you define fencing as STONITH (Shoot The Other Node in the
Head), Pgpool-II does not have the feature. However I am not sure
STONITH is always mandatory. I think that depends what you want to
avoid using fencing. If the purpose is to avoid having two primary
servers at the same time, Pgpool-II achieve that as described above.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Ron

Date:

07 April 2023, 05:15:22

On 4/6/23 23:16, Tatsuo Ishii wrote:
>>>> But, I heard PgPool is still affected by Split brain syndrome.
>>> Can you elaborate more? If more than 3 pgpool watchdog nodes (the
>>> number of nodes must be odd) are configured, a split brain can be
>>> avoided.
>> Split brain is a hard situation to avoid. I suppose OP is talking about
>> PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would
>> avoid that.
> Ok, "split brain" means here that there are two or more PostgreSQL
> primary serves exist.
>
> Pgpool-II's watchdog has a feature called "quorum failover" to avoid
> the situation. To make this work, you need to configure 3 or more
> Pgpool-II nodes. Suppose they are w0, w1 and w2. Also suppose there
> are two PostgreSQL servers pg0 (primary) and pg1 (standby). The goal
> is to avoid that both pg0 and pg1 become primary servers.
>
> Pgpool-II periodically monitors PostgreSQL healthiness by checking
> whether it can reach to the PostgreSQL servers. Suppose w0 and w1
> detect that pg0 is healthy but pg1 is not, while w2 thinks oppositely,
> i.e. pg0 is unhealthy but pg1 is healthy (this could happen if w0, w1,
> pg0 are in a network A, but w2 and pg1 in different network B. A and B
> cannot reach each other).
>
> In this situation if w2 promotes pg1 because w0 seems to be down, then
> the system ends up with two primary servers: split brain.
>
> With quorum failover is enabled, w0, w1, and w2 communicate each other
> to vote who is correct (if it cannot communicate, it regards other
> watchdog is down). In the case above w0 and w1 are majority and will
> win. Thus w0 and w1 just detach pg1 and keep on using pg0 as the
> primary. On the other hand, since wg2 looses, and it gives up
> promoting pg1, thus the split brain is avoided.
>
> Note that in the configuration above, clients access the cluster via
> VIP. VIP is always controlled by majority watchdog, clients will not
> access pg1 because it is set to down status by w0 and w1.

And this concept is quite old.  (It's also what Windows clustering uses.)

-- 
Born in Arizona, moved to Babylonia.

Re: Patroni vs pgpool II

From

Nikolay Samokhvalov

Date:

07 April 2023, 05:41:11

On Thu, Apr 6, 2023 at 9:17 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> With quorum failover is enabled, w0, w1, and w2 communicate each other
> to vote who is correct (if it cannot communicate, it regards other
> watchdog is down). In the case above w0 and w1 are majority and will
> win.

Communication takes time – network latencies. What if during this
communication, the situation becomes different?

What if some of them cannot communicate with each other due to network issues?

What if pg1 is currently primary, pg0 is standby, both are healthy, but
due not network issues, both pg1 and w2 are not reachable to other
nodes? Will pg1 remain primary, and w0 and w1 decide to promote pg0?

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

07 April 2023, 06:13:28

> Communication takes time – network latencies. What if during this
> communication, the situation becomes different?

We have to accept it (and do the best to mitigate any consequence of
the problem). I think there's no such a system which presuppose 0
communication latency.

> What if some of them cannot communicate with each other due to network issues?

Can you elaborate more? There are many scenarios for communication
break down. I hesitate to discuss all of them on this forum since this
is for discussions on PostgreSQL, not Pgpool-II. I am welcome you to
join and continue the discussion on pgpool mailing list.

> What if pg1 is currently primary, pg0 is standby, both are healthy, but
> due not network issues, both pg1 and w2 are not reachable to other
> nodes? Will pg1 remain primary, and w0 and w1 decide to promote pg0?

pg1 will remain primary but it is set to "quarantine" state from
pgpool's point of view, which means clients cannot access pg1 via
pgpool.

w0 and w1 will decide to promote pg0.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Nikolay Samokhvalov

Date:

07 April 2023, 06:44:18

On Thu, Apr 6, 2023 at 11:13 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> I am welcome you to
> join and continue the discussion on pgpool mailing list.

I truly believe that this problem – HA – is PostgreSQL's, not 3rd
party's. And it's a shame that Postgres itself doesn't solve this. So
we're discussing it here.

> > What if pg1 is currently primary, pg0 is standby, both are healthy, but
> > due not network issues, both pg1 and w2 are not reachable to other
> > nodes? Will pg1 remain primary, and w0 and w1 decide to promote pg0?
>
> pg1 will remain primary but it is set to "quarantine" state from
> pgpool's point of view, which means clients cannot access pg1 via
> pgpool.

So we have a split brain here – two primaries. Especially if some
clients communicate with PG directly. And even if there are no such
clients, archive_command is going to
work on both nodes, monitoring will show two primaries confusing
humans (e.g, SREs) and various systems, if we have many standby nodes,
some of them might continue replicating from the old primary if they
happen to be in the same network partition, and so on. I don't see how
all these things can be solved with this approach.

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

07 April 2023, 07:17:20

> I truly believe that this problem – HA – is PostgreSQL's, not 3rd
> party's. And it's a shame that Postgres itself doesn't solve this. So
> we're discussing it here.

Let's see what other subscribers on this forum say.

>> > What if pg1 is currently primary, pg0 is standby, both are healthy, but
>> > due not network issues, both pg1 and w2 are not reachable to other
>> > nodes? Will pg1 remain primary, and w0 and w1 decide to promote pg0?
>>
>> pg1 will remain primary but it is set to "quarantine" state from
>> pgpool's point of view, which means clients cannot access pg1 via
>> pgpool.
> 
> So we have a split brain here – two primaries. Especially if some
> clients communicate with PG directly. 

Clients are not allowed to communicate with PostgreSQL
directory. That's the prerequisite of using Pgpool-II.

> And even if there are no such
> clients, archive_command is going to
> work on both nodes,

What's the problem with this? Moreover you can write a logic to
disable this in the failover command.

> monitoring will show two primaries confusing
> humans (e.g, SREs) and various systems,

That's why pgpool provides its own monitoring tools. Clustering system
is different from standalone PostgreSQL. Existing PostgreSQL tools
usually only take account of stand alone PostgreSQL. Users have to
realize the difference.

> if we have many standby nodes,
> some of them might continue replicating from the old primary if they
> happen to be in the same network partition, and so on.

As of pg0 and existing standby in the same network as pg0, you can
either manually or automatically make them to follow pg0.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Jehan-Guillaume de Rorthais

Date:

07 April 2023, 07:45:57

On Fri, 07 Apr 2023 13:16:59 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> >> > But, I heard PgPool is still affected by Split brain syndrome.
> >>
> >> Can you elaborate more? If more than 3 pgpool watchdog nodes (the
> >> number of nodes must be odd) are configured, a split brain can be
> >> avoided.
> >
> > Split brain is a hard situation to avoid. I suppose OP is talking about
> > PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would
> > avoid that.
>
> Ok, "split brain" means here that there are two or more PostgreSQL
> primary serves exist.
>
> Pgpool-II's watchdog has a feature called "quorum failover" to avoid
> the situation. To make this work, you need to configure 3 or more
> Pgpool-II nodes. Suppose they are w0, w1 and w2. Also suppose there
> are two PostgreSQL servers pg0 (primary) and pg1 (standby). The goal
> is to avoid that both pg0 and pg1 become primary servers.
>
> Pgpool-II periodically monitors PostgreSQL healthiness by checking
> whether it can reach to the PostgreSQL servers. Suppose w0 and w1
> detect that pg0 is healthy but pg1 is not, while w2 thinks oppositely,
> i.e. pg0 is unhealthy but pg1 is healthy (this could happen if w0, w1,
> pg0 are in a network A, but w2 and pg1 in different network B. A and B
> cannot reach each other).
>
> In this situation if w2 promotes pg1 because w0 seems to be down, then
> the system ends up with two primary servers: split brain.
>
> With quorum failover is enabled, w0, w1, and w2 communicate each other
> to vote who is correct (if it cannot communicate, it regards other
> watchdog is down). In the case above w0 and w1 are majority and will
> win. Thus w0 and w1 just detach pg1 and keep on using pg0 as the
> primary. On the other hand, since wg2 looses, and it gives up
> promoting pg1, thus the split brain is avoided.
>
> Note that in the configuration above, clients access the cluster via
> VIP. VIP is always controlled by majority watchdog, clients will not
> access pg1 because it is set to down status by w0 and w1.
>
> > To avoid split brain, you need to implement a combinaison of quorum and
> > (self-)fencing.
> >
> > Patroni quorum is in the DCS's hands. Patroni's self-fencing can be achieved
> > with the (hardware) watchdog. You can also implement node fencing through
> > the "pre_promote" script to fence the old primary node before promoting the
> > new one.
> >
> > If you need HA with a high level of anti-split-brain security, you'll not be
> > able to avoid some sort of fencing, no matter what.
> >
> > Good luck.
>
> Well, if you define fencing as STONITH (Shoot The Other Node in the
> Head), Pgpool-II does not have the feature.

And I believe that's part of what Cen was complaining about:

«
  It is basically a daemon glued together with scripts for which you are
  entirely responsible for. Any small mistake in failover scripts and
  cluster enters  a broken state.
»

If you want to build something clean, including fencing, you'll have to
handle/dev it by yourself in scripts

> However I am not sure STONITH is always mandatory.

Sure, it really depend on how much risky you can go and how much complexity you
can afford. Some cluster can leave with a 10 minute split brain where some other
can not survive a 5s split brain.

> I think that depends what you want to avoid using fencing. If the purpose is
> to avoid having two primary servers at the same time, Pgpool-II achieve that
> as described above.

How could you be so sure?

See https://www.alteeve.com/w/The_2-Node_Myth

«
  * Quorum is a tool for when things are working predictably
  * Fencing is a tool for when things go wrong
»

Regards,

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

07 April 2023, 09:04:05

> And I believe that's part of what Cen was complaining about:
>
> «
>   It is basically a daemon glued together with scripts for which you are
>   entirely responsible for. Any small mistake in failover scripts and
>   cluster enters  a broken state.
> »
>
> If you want to build something clean, including fencing, you'll have to
> handle/dev it by yourself in scripts

That's a design decision. This gives maximum flexibility to users.

Please note that we provide step-by-step installation/configuration
documents which has been used by production systems.

https://www.pgpool.net/docs/44/en/html/example-cluster.html

>> However I am not sure STONITH is always mandatory.
>
> Sure, it really depend on how much risky you can go and how much complexity you
> can afford. Some cluster can leave with a 10 minute split brain where some other
> can not survive a 5s split brain.
>
>> I think that depends what you want to avoid using fencing. If the purpose is
>> to avoid having two primary servers at the same time, Pgpool-II achieve that
>> as described above.
>
> How could you be so sure?
>
> See https://www.alteeve.com/w/The_2-Node_Myth
>
> «
>   * Quorum is a tool for when things are working predictably
>   * Fencing is a tool for when things go wrong

I think the article does not apply to Pgpool-II.

-------------------------------------------------------------------
 3-Node

When node 1 stops responding, node 2 declares it lost, reforms a
cluster with the quorum node, node 3, and is quorate. It begins
recovery by mounting the filesystem under NFS, which replays journals
and cleans up, then starts NFS and takes the virtual IP address.

Later, node 1 recovers from its hang. At the moment of recovery, it
has no concept that time has passed and so has no reason to check to
see if it is still quorate or whether its locks are still valid. It
just finished doing whatever it was doing at the moment it hung.

In the best case scenario, you now have two machines claiming the same
IP address. At worse, you have uncoordinated writes to storage and you
corrupt your data.
-------------------------------------------------------------------

> Later, node 1 recovers from its hang.

Pgpool-II does not allow an automatic recover. If node 1 hangs and
once it is recognized as "down" by other nodes, it will not be used
without manual intervention. Thus the disaster described above will
not happen in pgpool.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Jehan-Guillaume de Rorthais

Date:

07 April 2023, 10:46:12

On Fri, 07 Apr 2023 18:04:05 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> > And I believe that's part of what Cen was complaining about:
> >
> > «
> >   It is basically a daemon glued together with scripts for which you are
> >   entirely responsible for. Any small mistake in failover scripts and
> >   cluster enters  a broken state.
> > »
> >
> > If you want to build something clean, including fencing, you'll have to
> > handle/dev it by yourself in scripts
>
> That's a design decision. This gives maximum flexibility to users.

Sure, no problem with that. But people has to realize that the downside is that
it left the whole complexity and reliability of the cluster in the hands of
the administrator. And these are much more complicated and racy than
a simple promote node.

Even dealing with a simple vIP can become a nightmare if not done correctly.

> Please note that we provide step-by-step installation/configuration
> documents which has been used by production systems.
>
> https://www.pgpool.net/docs/44/en/html/example-cluster.html

These scripts rely on SSH, which is really bad. What if you have a SSH failure
in the mix?

Moreover, even if SSH wouldn't be a weakness by itself, the script it doesn't
even try to shutdown the old node or stop the old primary.

You can add to the mix that both Pgpool and SSH rely on TCP for availability
checks and actions. You better have very low TCP timeout/retry...

When a service lose quorum on a resource, it is supposed to shutdown as fast as
possible... Or even self-fence itself using a watchdog device if the shutdown
action doesn't return fast enough.

> >> However I am not sure STONITH is always mandatory.
> >
> > Sure, it really depend on how much risky you can go and how much complexity
> > you can afford. Some cluster can leave with a 10 minute split brain where
> > some other can not survive a 5s split brain.
> >
> >> I think that depends what you want to avoid using fencing. If the purpose
> >> is to avoid having two primary servers at the same time, Pgpool-II achieve
> >> that as described above.
> >
> > How could you be so sure?
> >
> > See https://www.alteeve.com/w/The_2-Node_Myth
> >
> > «
> >   * Quorum is a tool for when things are working predictably
> >   * Fencing is a tool for when things go wrong
>
> I think the article does not apply to Pgpool-II.

It is a simple example using NFS. The point here is that when things are
getting unpredictable, Quorum is just not enough. So yes, it does apply to
Pgpool.

Quorum is nice when nodes can communicate with each others, when they have
enough time and/or minimal load to complete actions correctly.

My point is that a proper cluster with a anti-split-brain solution required
need both quorum and fencing.

> [...]
> > Later, node 1 recovers from its hang.
>
> Pgpool-II does not allow an automatic recover.

This example neither. There's no automatic recover. It just state that node 1
was unable to answer in a timely fashion, just enough for a new quorum to be
formed and elect a new primary. But node 1 was not dead, and when node 1 is
able to answer, boom.

Service being muted for some period of time is really common. There's various
articles/confs feedback about cluster failing-over wrongly because of eg. a high
load on the primary... Last one was during the fosdem iirc.

> If node 1 hangs and once it is recognized as "down" by other nodes, it will
> not be used without manual intervention. Thus the disaster described above
> will not happen in pgpool.

Ok, so I suppose **all** connections, scripts, softwares, backups, maintenances
and admins must go through Pgpool to be sure to hit the correct primary.

This might be acceptable in some situation, but I wouldn't call that an
anti-split-brain solution. It's some kind of «software hiding the rogue node
behind a curtain and pretend it doesn't exist anymore»

Regards,

Re: Patroni vs pgpool II

From

Ron

Date:

07 April 2023, 11:12:22

On 4/7/23 05:46, Jehan-Guillaume de Rorthais wrote:

On Fri, 07 Apr 2023 18:04:05 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

And I believe that's part of what Cen was complaining about:

«  It is basically a daemon glued together with scripts for which you are   entirely responsible for. Any small mistake in failover scripts and   cluster enters  a broken state.
»

If you want to build something clean, including fencing, you'll have to
handle/dev it by yourself in scripts

That's a design decision. This gives maximum flexibility to users.

Sure, no problem with that. But people has to realize that the downside is that
it left the whole complexity and reliability of the cluster in the hands of
the administrator. And these are much more complicated and racy than 
a simple promote node.

Even dealing with a simple vIP can become a nightmare if not done correctly.

Please note that we provide step-by-step installation/configuration
documents which has been used by production systems.

https://www.pgpool.net/docs/44/en/html/example-cluster.html

These scripts rely on SSH, which is really bad. What if you have a SSH failure
in the mix? 

Moreover, even if SSH wouldn't be a weakness by itself, the script it doesn't
even try to shutdown the old node or stop the old primary.

That does not matter, when only PgPool does the writing to the database.

You can add to the mix that both Pgpool and SSH rely on TCP for availability
checks and actions. You better have very low TCP timeout/retry...

When a service lose quorum on a resource, it is supposed to shutdown as fast as
possible... Or even self-fence itself using a watchdog device if the shutdown
action doesn't return fast enough.

Scenario:
S0 - Running Postgresql as primary, and also PgPool.
S1 - Running Postgresql as secondary, and also PgPool.
S2 - Running only PgPool. Has the VIP.

There's no need for Postgresql or PgPool on server 0 to shut down if it loses contact with S1 and S2, since they'll also notice that that S1 has disappeared. In that case, they'll vote S1 into degraded state, and promote S1 to be the Postgresql primary.

A good question is what happens when S0 and S1 lose connection to S2 (meaning that S2 loses connection to them, too). S0 and S1 then "should" vote that S0 take over the VIP. But, if S2 is still up and can connect to "the world", does it voluntarily decide to give up the VIP since it's all alone?

--
Born in Arizona, moved to Babylonia.

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

07 April 2023, 12:10:42

> Scenario:
> S0 - Running Postgresql as primary, and also PgPool.
> S1 - Running Postgresql as secondary, and also PgPool.
> S2 - Running only PgPool.  Has the VIP.
>
> There's no /need/ for Postgresql or PgPool on server 0 to shut down if
> it loses contact with S1 and S2, since they'll also notice that that
> S1 has disappeared.  In that case, they'll vote S1 into degraded
> state, and promote S1 to be the Postgresql primary.
>
> A good question is what happens when S0 and S1 lose connection to S2
> (meaning that S2 loses connection to them, too).  S0 and S1 then
> "should" vote that S0 take over the VIP.  But, if S2 is still up and
> can connect to "the world", does it voluntarily decide to give up the
> VIP since it's all alone?

Yes, because S2 pgpool is not the leader anymore. In this case S2
voluntarily gives up VIP.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Tatsuo Ishii

Date:

07 April 2023, 12:16:04

>> If node 1 hangs and once it is recognized as "down" by other nodes, it will
>> not be used without manual intervention. Thus the disaster described above
>> will not happen in pgpool.
>
> Ok, so I suppose **all** connections, scripts, softwares, backups, maintenances
> and admins must go through Pgpool to be sure to hit the correct primary.
>
> This might be acceptable in some situation, but I wouldn't call that an
> anti-split-brain solution. It's some kind of «software hiding the rogue node
> behind a curtain and pretend it doesn't exist anymore»

You can call Pgpool-II whatever you like. Important thing for me (and
probably for users) is, if it can solve user's problem or not.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Patroni vs pgpool II

From

Jehan-Guillaume de Rorthais

Date:

07 April 2023, 14:33:42

On Fri, 07 Apr 2023 21:16:04 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> >> If node 1 hangs and once it is recognized as "down" by other nodes, it will
> >> not be used without manual intervention. Thus the disaster described above
> >> will not happen in pgpool.
> >
> > Ok, so I suppose **all** connections, scripts, softwares, backups,
> > maintenances and admins must go through Pgpool to be sure to hit the
> > correct primary.
> >
> > This might be acceptable in some situation, but I wouldn't call that an
> > anti-split-brain solution. It's some kind of «software hiding the rogue node
> > behind a curtain and pretend it doesn't exist anymore»
>
> You can call Pgpool-II whatever you like.

I didn't mean to be rude here. Please, accept my apologies if my words offended
you.

I consider "proxy-based" fencing architecture fragile because you just don't
know what is happening on your rogue node as long as a meatware is coming along
to deal with it. Moreover, you must trust your scripts, configurations,
procedures, admins, applications, users, replication, network, Pgpool, etc to
not fail on you in the meantime...

In the Pacemaker world, where everything MUST be **predictable**, the only way
to predict the state of a rogue node is to fence it from the cluster. Either cut
it from the network, shut it down or set up the watchdog so it reset itself if
needed. At the end, you know your old primary is off or idle or screaming in
the void with no one to hear it. It can't harm your other nodes, data or apps
anymore, no matter what.

> Important thing for me (and probably for users) is, if it can solve user's
> problem or not.

In my humble (and biased) opinion, Patroni, PAF or shared storage cluster are
solving user's problem in regard with HA. All with PROs and CONs. All rely on
strong, safe, well known and well developed clustering concepts.

Some consider they are complex pieces of software to deploy and maintain, but
this is because HA is complex. No miracle here.

Solutions like Pgpool or Repmgr are trying hard to re-implement HA concepts
but left most of this complexity and safety to the user discretion.
Unfortunately, this is not the role of the user to deal with such things. This
kind of architecture probably answer a need, a gray zone, where it is good
enough. I've seen similar approach in the past with pgbouncer + bash scripting
calling themselves "fencing" solution [1]. I'm fine with it as far as people
are clear about the limitations.

Kind regards,

[1] eg.
https://www.postgresql.eu/events/pgconfeu2016/sessions/session/1348-ha-with-repmgr-barman-and-pgbouncer/

Re: Patroni vs pgpool II

From

Allan Nielsen

Date:

10 April 2023, 09:07:26

I have used pg_auto_failover for quite some time which provides automatic failover with sync replication.

I did not have any issues yet and running around 30 clusters.

Regards Allan

man. 3. apr. 2023 kl. 08.33 skrev Inzamam Shafiq <inzamam.shafiq@hotmail.com>:

Hi Guys,

Hope you are doing well.

Can someone please suggest what is one (Patroni vs PGPool II) is best for achieving HA/Auto failover, Load balancing for DB servers. Along with this, can you please share the company/client names using these tools for large PG databases?

Thanks.

Regards,

Inzamam Shafiq

Med venlig hilsen / With kind regards
Allan Nielsen