Re: Postgresql HA cluster - Mailing list pgsql-general

From Jehan-Guillaume de Rorthais
Subject Re: Postgresql HA cluster
Date
Msg-id 20231018094453.019d972d@karst
Whole thread Raw
In response to RE: Postgresql HA cluster  (Jason Grammenos <jason.grammenos@agilitypr.com>)
List pgsql-general
Hi Jason,

On Tue, 17 Oct 2023 19:59:00 +0000
Jason Grammenos <jason.grammenos@agilitypr.com> wrote:

[...]
> If you have 2 PostgreSQL nodes hooked up to a Load balancer (haproxy), and
> you move take node1 out of load balancing, you now have connections on node1
> and connections on node2, as the Load balancer drains the connections off
> node1 and over to node2. How does PostgreSQL handle this scenario when there
> are writes happening on both nodes?

As you write about "writes happening on both nodes", you describe a
Multi-primary architecture that PostgreSQL does not support.

A standby is in "recovery mode", receiving and applying WALs directly from
the primary (or archives).

A standby will only accept read-only (RO) queries (providing you set
"hot_standby=on") as it can not produce WAL itself. Any write attempt raises
an ERROR to the client. This is a really strict state, not an advisory one. You
just can't write on a standby, even as a superuser.

So, in the situation you describe, as long as the standby is not promoted to
primary, any write (RW) query on it will just fails hard.

Obviously, you should definitely NOT promote a standby while another primary is
still alive in the cluster. You would end up with a bad split brain scenario.

About haproxy, you should use tcp-check (eg. on a Systemd socket) or http-check
(eg. on Patroni API) to decide on which server the primary is hosted. It
avoids this transition period where both nodes have RW connections.

> If instead you have 2 PostgreSQL nodes behind pacemaker (controlling a
> floating ip), what happens when you initiate a failover and move the floating
> ip? You want the connections to drain off node1 and move to node2. Again in
> this scenario, both nodes would be sustaining writes at some point in time.

No, both nodes would not be sustaining writes in the same time. Two primaries
can not leave in the same cluster in the same time, this would be a split brain
scenario. If this happen, you have to pick one and rewind or restore it from
PITR, losing the transactions it wrote in the meantime. Of course, you can also
compare both nodes and tries to recover missing transaction before restoring
one of them, but it takes a lot of time...That's why clusters (not just for
PostgreSQL) tries hard to avoid split brain scenario using fencing, quorum,
watchdog, etc.

Before moving the IP address, you have to demote your primary as standby.
Unfortunately, this can not be achieved without restarting the primary
PostgreSQL as a standby one on its node. But because demote == restart as
standby, you effectively end all clients sessions. At this point, all your
nodes will be in standby mode, without primary to feed them.

So, during a switchover Pacemaker will:

* demote your current primary (ending all sessions)
* promote the other standby as primary
* move the IP address to the new primary (colocation rule in Pacemaker between
  the IP and the promoted role).

As soon as a primary appears in the cluster and the IP address (or haproxy)
points on it, all other standbies nodes will start replicating from it.

> How does the write only replica get changed out of write mode during this
> failover?

"write only"? Do you speak about the primary in RW mode or about a standby in
RO mode? 

For a primary: a PostgreSQL restart.
For a standby: promotion happen live, all existing sessions are kept and can
then issue write queries.

> How does the primary node get switched to read only after the
> connection drain is complete?

A PostgreSQL restart.

> Overall, I am trying to understand what it looks like operationally to run a
> 2 node postgresql "cluster" and how patching both nodes would work (and
> mentally comparing it to what I currently do with mysql).
> 
> You recommended that primary-standby could be sufficient and is much simpler,
> so I am simply trying to wrap my head around what exactly running it would
> look like. If primary standby is simple enough to failover, patch , reboot,
> maintain, etc. Then you could be correct that master->master may not be
> needed.

And if you have a strong, safe and high available storage, a shared storage
cluster for the primary is even simpler IMHO (because you move all the
complexity off your hands to the storage guys ones).

For details, I recommend reading this chapter in documentation:
https://www.postgresql.org/docs/current/high-availability.html

Regards,



pgsql-general by date:

Previous
From: Saravanan P K
Date:
Subject: Partition Lock Issue
Next
From: Marcin Borkowski
Date:
Subject: A few questions about foreign tables