Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Date
Msg-id CALj2ACUO6oz-43ryqfMOVZ_Q-N10C5tkzKku12+QV02NnXsDrw@mail.gmail.com
Whole thread Raw
In response to Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
List pgsql-hackers
On Tue, Aug 9, 2022 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I've explained the problem with the current HA setup with synchronous
> replication upthread at [1]. Let me reiterate it here once again.
>
> When a query is cancelled (a simple stroke of CTRL+C or
> pg_cancel_backend() call) while the txn is waiting for ack in
> SyncRepWaitForLSN(); for the client, the txn is actually committed
> (locally-committed-but-not-yet-replicated to all of sync standbys)
> like any other txn, a warning is emitted into server logs but it is of
> no use for the client (think of client as applications). There can be
> many such txns waiting for ack in SyncRepWaitForLSN() and query cancel
> can be issued on all of those sessions. The problem is that the
> subsequent reads will then be able to read all of those
> locally-committed-but-not-yet-replicated to all of sync standbys txns
> data - this is what I call an inconsistency (can we call this a
> read-after-write inconsistency?) because of lack of proper query
> cancel handling. And if the sync standbys are down or unable to come
> up for some reason, until then, the primary will be serving clients
> with the inconsistent data. BTW, I found a report of this problem here
> [2].
>
> The solution proposed for the above problem is to just 'not honor
> query cancels at all while waiting for ack in SyncRepWaitForLSN()'.
>
> When a proc die is pending, then also, there can be
> locally-committed-but-not-yet-replicated to all of sync standbys txns.
> Typically, there are two choices for the clients 1) reuse the primary
> instance after restart 2) failover to one of sync standbys. For case
> (1), there might be read-after-write inconsistency as explained above.
> For case (2), those txns might get lost completely if the failover
> target sync standby or the new primary didn't receive them and the
> other sync standbys that have received them are now ahead and need a
> special treatment (run pg_rewind) for them to be able to connect to
> new primary.
>
> The solution proposed for case (1) of the above problem is to 'process
> the ProcDiePending immediately and upon restart the first backend can
> wait until the sync standbys are caught up to ensure no inconsistent
> reads'.
> The solution proposed for case (2) of the above problem is to 'either
> run pg_rewind for the sync standbys that are ahead or use the idea
> proposed at [3]'.
>
> I hope the above explanation helps.
>
> [1]
https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%40mail.gmail.com
> [2] https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication
> [3] https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gmail.com

I'm attaching the v2 patch rebased on the latest HEAD. Please note
that there are still some open points, I'm yet to find time to think
more about them. Meanwhile, I'm posting the v2 patch for making cfbot
happy. Any further thoughts on the overall design of the patch are
most welcome. Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [RFC] building postgres with meson - v13
Next
From: bt22nakamorit
Date:
Subject: Re: Differentiate MERGE queries with different structures