On Tue, Aug 9, 2022 at 12:42 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> > Can you please explain more about your idea, I may be missing something?
>
> (I'm not sure I understand the requirements here..)
I've explained the problem with the current HA setup with synchronous
replication upthread at [1]. Let me reiterate it here once again.
When a query is cancelled (a simple stroke of CTRL+C or
pg_cancel_backend() call) while the txn is waiting for ack in
SyncRepWaitForLSN(); for the client, the txn is actually committed
(locally-committed-but-not-yet-replicated to all of sync standbys)
like any other txn, a warning is emitted into server logs but it is of
no use for the client (think of client as applications). There can be
many such txns waiting for ack in SyncRepWaitForLSN() and query cancel
can be issued on all of those sessions. The problem is that the
subsequent reads will then be able to read all of those
locally-committed-but-not-yet-replicated to all of sync standbys txns
data - this is what I call an inconsistency (can we call this a
read-after-write inconsistency?) because of lack of proper query
cancel handling. And if the sync standbys are down or unable to come
up for some reason, until then, the primary will be serving clients
with the inconsistent data. BTW, I found a report of this problem here
[2].
The solution proposed for the above problem is to just 'not honor
query cancels at all while waiting for ack in SyncRepWaitForLSN()'.
When a proc die is pending, then also, there can be
locally-committed-but-not-yet-replicated to all of sync standbys txns.
Typically, there are two choices for the clients 1) reuse the primary
instance after restart 2) failover to one of sync standbys. For case
(1), there might be read-after-write inconsistency as explained above.
For case (2), those txns might get lost completely if the failover
target sync standby or the new primary didn't receive them and the
other sync standbys that have received them are now ahead and need a
special treatment (run pg_rewind) for them to be able to connect to
new primary.
The solution proposed for case (1) of the above problem is to 'process
the ProcDiePending immediately and upon restart the first backend can
wait until the sync standbys are caught up to ensure no inconsistent
reads'.
The solution proposed for case (2) of the above problem is to 'either
run pg_rewind for the sync standbys that are ahead or use the idea
proposed at [3]'.
I hope the above explanation helps.
[1]
https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%40mail.gmail.com
[2] https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication
[3] https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gmail.com
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/