Avoiding data loss with synchronous replication - Mailing list pgsql-hackers

Hi hackers,

As previously discussed [0], canceling synchronous replication waits
can have the unfortunate side effect of making transactions visible on
a primary server before they are replicated.  A failover at this time
would cause such transactions to be lost.  The proposed solution in
the previous thread [0] involved blocking such cancellations, but many
had concerns about that approach (e.g., backends could be
unresponsive, server restarts were still affected by this problem).  I
would like to propose something more like what Fujii-san suggested [1]
that would avoid blocking cancellations while still preventing data
loss.  I believe this is a key missing piece of the synchronous
replication functionality in PostgreSQL.

AFAICT there are a variety of ways that the aforementioned problem may
occur:
  1. Server restarts: As noted in the docs [2], "waiting transactions
     will be marked fully committed once the primary database
     recovers."  I think there are a few options for handling this,
     but the simplest would be to simply failover anytime the primary
     server shut down.  My proposal may offer other ways of helping
     with this.
  2. Backend crashes: If a backend crashes, the postmaster process
     will restart everything, leading to the same problem described in
     1.  However, this behavior can be prevented with the
     restart_after_crash parameter [3].
  3. Client disconnections: During waits for synchronous replication,
     interrupt processing is turned off, so disconnected clients
     actually don't seem to cause a problem.  The server will still
     wait for synchronous replication to complete prior to making the
     transaction visible on the primary.
  4. Query cancellations and backend terminations: This appears to be
     the only gap where there is no way to avoid potential data loss,
     and it is the main target of my proposal.

Instead of blocking query cancellations and backend terminations, I
think we should allow them to proceed, but we should keep the
transactions marked in-progress so they do not yet become visible to
sessions on the primary.  Once replication has caught up to the
the necessary point, the transactions can be marked completed, and
they would finally become visible.

The main advantages of this approach are 1) it still allows for
canceling waits for synchronous replication and 2) it provides an
opportunity to view and manage waits for synchronous replication
outside of the standard cancellation/termination functionality.  The
tooling for 2 could even allow a session to begin waiting for
synchronous replication again if it "inadvertently interrupted a
replication wait..." [4].  I think the main disadvantage of this
approach is that transactions committed by a session may not be
immediately visible to the session when the command returns after
canceling the wait for synchronous replication.  Instead, the
transactions would become visible in the future once the change is
replicated.  This may cause problems for an application if it doesn't
handle this scenario carefully.

What are folks' opinions on this idea?  Is this something that is
worth prototyping?

Nathan

[0] https://www.postgresql.org/message-id/flat/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C@yandex-team.ru
[1] https://www.postgresql.org/message-id/4f8d54c9-6f18-23d5-c4de-9d6656d3a408%40oss.nttdata.com
[2] https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA
[3] https://www.postgresql.org/docs/devel/runtime-config-error-handling.html#GUC-RESTART-AFTER-CRASH
[4] https://www.postgresql.org/message-id/CA%2BTgmoZpwBEyPDZixeHN9ZeNJJjd3EBEQ8nJPaRAsVexhssfNg%40mail.gmail.com


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Next
From: Mark Dilger
Date:
Subject: Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)