Re: Avoiding data loss with synchronous replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Avoiding data loss with synchronous replication |
Date | |
Msg-id | CAA4eK1L2p4NLyhidETqOphcZMv14mTqs6NCO2YpAk470zkFfwQ@mail.gmail.com Whole thread Raw |
In response to | Avoiding data loss with synchronous replication ("Bossart, Nathan" <bossartn@amazon.com>) |
Responses |
Re: Avoiding data loss with synchronous replication
("Bossart, Nathan" <bossartn@amazon.com>)
|
List | pgsql-hackers |
On Fri, Jul 23, 2021 at 2:48 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > Hi hackers, > > As previously discussed [0], canceling synchronous replication waits > can have the unfortunate side effect of making transactions visible on > a primary server before they are replicated. A failover at this time > would cause such transactions to be lost. The proposed solution in > the previous thread [0] involved blocking such cancellations, but many > had concerns about that approach (e.g., backends could be > unresponsive, server restarts were still affected by this problem). I > would like to propose something more like what Fujii-san suggested [1] > that would avoid blocking cancellations while still preventing data > loss. I believe this is a key missing piece of the synchronous > replication functionality in PostgreSQL. > > AFAICT there are a variety of ways that the aforementioned problem may > occur: > 1. Server restarts: As noted in the docs [2], "waiting transactions > will be marked fully committed once the primary database > recovers." I think there are a few options for handling this, > but the simplest would be to simply failover anytime the primary > server shut down. My proposal may offer other ways of helping > with this. > 2. Backend crashes: If a backend crashes, the postmaster process > will restart everything, leading to the same problem described in > 1. However, this behavior can be prevented with the > restart_after_crash parameter [3]. > 3. Client disconnections: During waits for synchronous replication, > interrupt processing is turned off, so disconnected clients > actually don't seem to cause a problem. The server will still > wait for synchronous replication to complete prior to making the > transaction visible on the primary. > 4. Query cancellations and backend terminations: This appears to be > the only gap where there is no way to avoid potential data loss, > and it is the main target of my proposal. > > Instead of blocking query cancellations and backend terminations, I > think we should allow them to proceed, but we should keep the > transactions marked in-progress so they do not yet become visible to > sessions on the primary. > One naive question, what if the primary gets some error while changing the status from in-progress to committed? Won't in such a case the transaction will be visible on standby but not on the primary? > Once replication has caught up to the > the necessary point, the transactions can be marked completed, and > they would finally become visible. > If the session issued the commit is terminated, will this work be done by some background process? -- With Regards, Amit Kapila.
pgsql-hackers by date: