An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Date
Msg-id CALj2ACUrOB59QaE6=jF2cFAyv1MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com
Whole thread Raw
Responses Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication  (Nathan Bossart <nathandbossart@gmail.com>)
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-hackers
Hi,

With synchronous replication typically all the transactions (txns)
first locally get committed, then streamed to the sync standbys and
the backend that generated the transaction will wait for ack from sync
standbys. While waiting for ack, it may happen that the query or the
txn gets canceled (QueryCancelPending is true) or the waiting backend
is asked to exit (ProcDiePending is true). In either of these cases,
the wait for ack gets canceled and leaves the txn in an inconsistent
state (as in the client thinks that the txn would have replicated to
sync standbys) - "The transaction has already committed locally, but
might not have been replicated to the standby.". Upon restart after
the crash or in the next txn after the old locally committed txn was
canceled, the client will be able to see the txns that weren't
actually streamed to sync standbys. Also, if the client fails over to
one of the sync standbys after the crash (either by choice or because
of automatic failover management after crash), the locally committed
txns on the crashed primary would be lost which isn't good in a true
HA solution.

Here's a proposal (mentioned previously by Satya [1]) to avoid the
above problems:
1) Wait a configurable amount of time before canceling the sync
replication by the backends i.e. delay processing of
QueryCancelPending and ProcDiePending in Introduced a new timeout GUC
synchronous_replication_naptime_before_cancel, when set, it will let
the backends wait for the ack before canceling the synchronous
replication so that the transaction can be available in sync standbys
as well. If the ack isn't received even within this time frame, the
backend cancels the wait and goes ahead as it does today. In
production HA environments, the GUC can be set to a reasonable value
to avoid missing transactions during failovers.
2) Wait for sync standbys to catch up upon restart after the crash or
in the next txn after the old locally committed txn was canceled. One
way to achieve this is to let the backend, that's making the first
connection, wait for sync standbys to catch up in ClientAuthentication
right after successful authentication. However, I'm not sure this is
the best way to do it at this point.

Thoughts?

Here's a WIP patch implementing the (1), I'm yet to code for (2). I
haven't added tests, I'm yet to figure out how to add one as there's
no way we can delay the WAL sender so that we can reliably hit this
code. I will think more about this.

[1] https://www.postgresql.org/message-id/CAHg%2BQDdTdPsqtu0QLG8rMg3Xo%3D6Xo23TwHPYsUgGNEK13wTY5g%40mail.gmail.com

Regards,
Bharath Rupireddy.

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: json_object returning jsonb reuslt different from returning json, returning text
Next
From: Magnus Hagander
Date:
Subject: Re: Estimating HugePages Requirements?