Avoiding data loss with synchronous replication - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Avoiding data loss with synchronous replication |
Date | |
Msg-id | FDE157D7-3F35-450D-B927-7EC2F82DB1D6@amazon.com Whole thread Raw |
Responses |
Re: Avoiding data loss with synchronous replication
Re: Avoiding data loss with synchronous replication Re: Avoiding data loss with synchronous replication Re: Avoiding data loss with synchronous replication |
List | pgsql-hackers |
Hi hackers, As previously discussed [0], canceling synchronous replication waits can have the unfortunate side effect of making transactions visible on a primary server before they are replicated. A failover at this time would cause such transactions to be lost. The proposed solution in the previous thread [0] involved blocking such cancellations, but many had concerns about that approach (e.g., backends could be unresponsive, server restarts were still affected by this problem). I would like to propose something more like what Fujii-san suggested [1] that would avoid blocking cancellations while still preventing data loss. I believe this is a key missing piece of the synchronous replication functionality in PostgreSQL. AFAICT there are a variety of ways that the aforementioned problem may occur: 1. Server restarts: As noted in the docs [2], "waiting transactions will be marked fully committed once the primary database recovers." I think there are a few options for handling this, but the simplest would be to simply failover anytime the primary server shut down. My proposal may offer other ways of helping with this. 2. Backend crashes: If a backend crashes, the postmaster process will restart everything, leading to the same problem described in 1. However, this behavior can be prevented with the restart_after_crash parameter [3]. 3. Client disconnections: During waits for synchronous replication, interrupt processing is turned off, so disconnected clients actually don't seem to cause a problem. The server will still wait for synchronous replication to complete prior to making the transaction visible on the primary. 4. Query cancellations and backend terminations: This appears to be the only gap where there is no way to avoid potential data loss, and it is the main target of my proposal. Instead of blocking query cancellations and backend terminations, I think we should allow them to proceed, but we should keep the transactions marked in-progress so they do not yet become visible to sessions on the primary. Once replication has caught up to the the necessary point, the transactions can be marked completed, and they would finally become visible. The main advantages of this approach are 1) it still allows for canceling waits for synchronous replication and 2) it provides an opportunity to view and manage waits for synchronous replication outside of the standard cancellation/termination functionality. The tooling for 2 could even allow a session to begin waiting for synchronous replication again if it "inadvertently interrupted a replication wait..." [4]. I think the main disadvantage of this approach is that transactions committed by a session may not be immediately visible to the session when the command returns after canceling the wait for synchronous replication. Instead, the transactions would become visible in the future once the change is replicated. This may cause problems for an application if it doesn't handle this scenario carefully. What are folks' opinions on this idea? Is this something that is worth prototyping? Nathan [0] https://www.postgresql.org/message-id/flat/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C@yandex-team.ru [1] https://www.postgresql.org/message-id/4f8d54c9-6f18-23d5-c4de-9d6656d3a408%40oss.nttdata.com [2] https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA [3] https://www.postgresql.org/docs/devel/runtime-config-error-handling.html#GUC-RESTART-AFTER-CRASH [4] https://www.postgresql.org/message-id/CA%2BTgmoZpwBEyPDZixeHN9ZeNJJjd3EBEQ8nJPaRAsVexhssfNg%40mail.gmail.com
pgsql-hackers by date: