Thread: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
Hi, With synchronous replication typically all the transactions (txns) first locally get committed, then streamed to the sync standbys and the backend that generated the transaction will wait for ack from sync standbys. While waiting for ack, it may happen that the query or the txn gets canceled (QueryCancelPending is true) or the waiting backend is asked to exit (ProcDiePending is true). In either of these cases, the wait for ack gets canceled and leaves the txn in an inconsistent state (as in the client thinks that the txn would have replicated to sync standbys) - "The transaction has already committed locally, but might not have been replicated to the standby.". Upon restart after the crash or in the next txn after the old locally committed txn was canceled, the client will be able to see the txns that weren't actually streamed to sync standbys. Also, if the client fails over to one of the sync standbys after the crash (either by choice or because of automatic failover management after crash), the locally committed txns on the crashed primary would be lost which isn't good in a true HA solution. Here's a proposal (mentioned previously by Satya [1]) to avoid the above problems: 1) Wait a configurable amount of time before canceling the sync replication by the backends i.e. delay processing of QueryCancelPending and ProcDiePending in Introduced a new timeout GUC synchronous_replication_naptime_before_cancel, when set, it will let the backends wait for the ack before canceling the synchronous replication so that the transaction can be available in sync standbys as well. If the ack isn't received even within this time frame, the backend cancels the wait and goes ahead as it does today. In production HA environments, the GUC can be set to a reasonable value to avoid missing transactions during failovers. 2) Wait for sync standbys to catch up upon restart after the crash or in the next txn after the old locally committed txn was canceled. One way to achieve this is to let the backend, that's making the first connection, wait for sync standbys to catch up in ClientAuthentication right after successful authentication. However, I'm not sure this is the best way to do it at this point. Thoughts? Here's a WIP patch implementing the (1), I'm yet to code for (2). I haven't added tests, I'm yet to figure out how to add one as there's no way we can delay the WAL sender so that we can reliably hit this code. I will think more about this. [1] https://www.postgresql.org/message-id/CAHg%2BQDdTdPsqtu0QLG8rMg3Xo%3D6Xo23TwHPYsUgGNEK13wTY5g%40mail.gmail.com Regards, Bharath Rupireddy.
Attachment
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Nathan Bossart
Date:
On Mon, Apr 25, 2022 at 07:51:03PM +0530, Bharath Rupireddy wrote: > With synchronous replication typically all the transactions (txns) > first locally get committed, then streamed to the sync standbys and > the backend that generated the transaction will wait for ack from sync > standbys. While waiting for ack, it may happen that the query or the > txn gets canceled (QueryCancelPending is true) or the waiting backend > is asked to exit (ProcDiePending is true). In either of these cases, > the wait for ack gets canceled and leaves the txn in an inconsistent > state (as in the client thinks that the txn would have replicated to > sync standbys) - "The transaction has already committed locally, but > might not have been replicated to the standby.". Upon restart after > the crash or in the next txn after the old locally committed txn was > canceled, the client will be able to see the txns that weren't > actually streamed to sync standbys. Also, if the client fails over to > one of the sync standbys after the crash (either by choice or because > of automatic failover management after crash), the locally committed > txns on the crashed primary would be lost which isn't good in a true > HA solution. This topic has come up a few times recently [0] [1] [2]. > Thoughts? І think this will require a fair amount of discussion. I'm personally in favor of just adding a GUC that can be enabled to block canceling synchronous replication waits, but I know folks have concerns with that approach. When I looked at this stuff previously [2], it seemed possible to handle the other data loss scenarios (e.g., forcing failover whenever the primary shut down, turning off restart_after_crash). However, I'm not wedded to this approach. [0] https://postgr.es/m/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru [1] https://postgr.es/m/cac4b9df-92c6-77aa-687b-18b86cb13728%40stratox.cz [2] https://postgr.es/m/FDE157D7-3F35-450D-B927-7EC2F82DB1D6%40amazon.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Laurenz Albe
Date:
On Mon, 2022-04-25 at 19:51 +0530, Bharath Rupireddy wrote: > With synchronous replication typically all the transactions (txns) > first locally get committed, then streamed to the sync standbys and > the backend that generated the transaction will wait for ack from sync > standbys. While waiting for ack, it may happen that the query or the > txn gets canceled (QueryCancelPending is true) or the waiting backend > is asked to exit (ProcDiePending is true). In either of these cases, > the wait for ack gets canceled and leaves the txn in an inconsistent > state [...] > > Here's a proposal (mentioned previously by Satya [1]) to avoid the > above problems: > 1) Wait a configurable amount of time before canceling the sync > replication by the backends i.e. delay processing of > QueryCancelPending and ProcDiePending in Introduced a new timeout GUC > synchronous_replication_naptime_before_cancel, when set, it will let > the backends wait for the ack before canceling the synchronous > replication so that the transaction can be available in sync standbys > as well. > 2) Wait for sync standbys to catch up upon restart after the crash or > in the next txn after the old locally committed txn was canceled. While this may mitigate the problem, I don't think it will deal with all the cases which could cause a transaction to end up committed locally, but not on the synchronous standby. I think that only using the full power of two-phase commit can make this bulletproof. Is it worth adding additional complexity that is not a complete solution? Yours, Laurenz Albe
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Andrey Borodin
Date:
> 25 апр. 2022 г., в 21:48, Nathan Bossart <nathandbossart@gmail.com> написал(а): > > I'm personally in > favor of just adding a GUC that can be enabled to block canceling > synchronous replication waits +1. I think it's the only option to provide quorum commit guarantees. Best regards, Andrey Borodin.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Tue, Apr 26, 2022 at 11:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Mon, 2022-04-25 at 19:51 +0530, Bharath Rupireddy wrote: > > With synchronous replication typically all the transactions (txns) > > first locally get committed, then streamed to the sync standbys and > > the backend that generated the transaction will wait for ack from sync > > standbys. While waiting for ack, it may happen that the query or the > > txn gets canceled (QueryCancelPending is true) or the waiting backend > > is asked to exit (ProcDiePending is true). In either of these cases, > > the wait for ack gets canceled and leaves the txn in an inconsistent > > state [...] > > > > Here's a proposal (mentioned previously by Satya [1]) to avoid the > > above problems: > > 1) Wait a configurable amount of time before canceling the sync > > replication by the backends i.e. delay processing of > > QueryCancelPending and ProcDiePending in Introduced a new timeout GUC > > synchronous_replication_naptime_before_cancel, when set, it will let > > the backends wait for the ack before canceling the synchronous > > replication so that the transaction can be available in sync standbys > > as well. > > 2) Wait for sync standbys to catch up upon restart after the crash or > > in the next txn after the old locally committed txn was canceled. > > While this may mitigate the problem, I don't think it will deal with > all the cases which could cause a transaction to end up committed locally, > but not on the synchronous standby. I think that only using the full > power of two-phase commit can make this bulletproof. Not sure if it's recommended to use 2PC in postgres HA with sync replication where the documentation says that "PREPARE TRANSACTION" and other 2PC commands are "intended for use by external transaction management systems" and with explicit transactions. Whereas, the txns within a postgres HA with sync replication always don't have to be explicit txns. Am I missing something here? > Is it worth adding additional complexity that is not a complete solution? The proposed approach helps to avoid some common possible problems that arise with simple scenarios (like cancelling a long running query while in SyncRepWaitForLSN) within sync replication. [1] https://www.postgresql.org/docs/devel/sql-prepare-transaction.html Regards, Bharath Rupireddy.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Dilip Kumar
Date:
On Mon, May 9, 2022 at 2:50 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Apr 26, 2022 at 11:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > > > On Mon, 2022-04-25 at 19:51 +0530, Bharath Rupireddy wrote: > > > With synchronous replication typically all the transactions (txns) > > > first locally get committed, then streamed to the sync standbys and > > > the backend that generated the transaction will wait for ack from sync > > > standbys. While waiting for ack, it may happen that the query or the > > > txn gets canceled (QueryCancelPending is true) or the waiting backend > > > is asked to exit (ProcDiePending is true). In either of these cases, > > > the wait for ack gets canceled and leaves the txn in an inconsistent > > > state [...] > > > > > > Here's a proposal (mentioned previously by Satya [1]) to avoid the > > > above problems: > > > 1) Wait a configurable amount of time before canceling the sync > > > replication by the backends i.e. delay processing of > > > QueryCancelPending and ProcDiePending in Introduced a new timeout GUC > > > synchronous_replication_naptime_before_cancel, when set, it will let > > > the backends wait for the ack before canceling the synchronous > > > replication so that the transaction can be available in sync standbys > > > as well. > > > 2) Wait for sync standbys to catch up upon restart after the crash or > > > in the next txn after the old locally committed txn was canceled. > > > > While this may mitigate the problem, I don't think it will deal with > > all the cases which could cause a transaction to end up committed locally, > > but not on the synchronous standby. I think that only using the full > > power of two-phase commit can make this bulletproof. > > Not sure if it's recommended to use 2PC in postgres HA with sync > replication where the documentation says that "PREPARE TRANSACTION" > and other 2PC commands are "intended for use by external transaction > management systems" and with explicit transactions. Whereas, the txns > within a postgres HA with sync replication always don't have to be > explicit txns. Am I missing something here? > > > Is it worth adding additional complexity that is not a complete solution? > > The proposed approach helps to avoid some common possible problems > that arise with simple scenarios (like cancelling a long running query > while in SyncRepWaitForLSN) within sync replication. IMHO, making it wait for some amount of time, based on GUC is not a complete solution. It is just a hack to avoid the problem in some cases. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Andrey Borodin
Date:
> On 9 May 2022, at 14:20, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Apr 26, 2022 at 11:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: >> >> While this may mitigate the problem, I don't think it will deal with >> all the cases which could cause a transaction to end up committed locally, >> but not on the synchronous standby. I think that only using the full >> power of two-phase commit can make this bulletproof. > > Not sure if it's recommended to use 2PC in postgres HA with sync > replication where the documentation says that "PREPARE TRANSACTION" > and other 2PC commands are "intended for use by external transaction > management systems" and with explicit transactions. Whereas, the txns > within a postgres HA with sync replication always don't have to be > explicit txns. Am I missing something here? COMMIT PREPARED needs to be replicated as well, thus encountering the very same problem as usual COMMIT: if done during failoverit can be canceled and committed data can be wrongfully reported durably written. 2PC is not a remedy to the factthat PG silently cancels awaiting of sync replication. The problem arise in presence of any "commit". And "commit" isthere if transactions are there. > On 9 May 2022, at 14:44, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > IMHO, making it wait for some amount of time, based on GUC is not a > complete solution. It is just a hack to avoid the problem in some > cases. Disallowing cancelation of locally committed transactions is not a hack. It's removing of a hack that was erroneously installedto make backend responsible to Ctrl+C (or client side statement timeout). > On 26 Apr 2022, at 11:26, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Is it worth adding additional complexity that is not a complete solution? Its not additional complexity. It is removing additional complexity that made sync rep interruptible. (But I'm surely talkingnot about GUCs like synchronous_replication_naptime_before_cancel - wait of sync rep must be indefinite until synchrous_commit\synchronous_standby_namesare satisfied ) And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presence ofa server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timeline endis successfully replicated to synchronous_standby_names. Best regards, Andrey Borodin.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Dilip Kumar
Date:
On Mon, May 9, 2022 at 4:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > On 9 May 2022, at 14:44, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > IMHO, making it wait for some amount of time, based on GUC is not a > > complete solution. It is just a hack to avoid the problem in some > > cases. > > Disallowing cancelation of locally committed transactions is not a hack. It's removing of a hack that was erroneously installedto make backend responsible to Ctrl+C (or client side statement timeout). I might be missing something but based on my understanding the approach is not disallowing the query cancellation but it is just adding the configuration for how much to delay before canceling the query. That's the reason I mentioned that this is not a guarenteed solution. I mean with this configuration value also you can not avoid problems in all the cases, right? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Tue, May 10, 2022 at 1:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, May 9, 2022 at 4:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > On 9 May 2022, at 14:44, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > IMHO, making it wait for some amount of time, based on GUC is not a > > > complete solution. It is just a hack to avoid the problem in some > > > cases. > > > > Disallowing cancelation of locally committed transactions is not a hack. It's removing of a hack that was erroneouslyinstalled to make backend responsible to Ctrl+C (or client side statement timeout). > > I might be missing something but based on my understanding the > approach is not disallowing the query cancellation but it is just > adding the configuration for how much to delay before canceling the > query. That's the reason I mentioned that this is not a guarenteed > solution. I mean with this configuration value also you can not avoid > problems in all the cases, right? Yes Dilip, the proposed GUC in v1 patch doesn't allow waiting forever for sync repl ack, in other words, doesn't allow blocking the pending query cancels or proc die interrupts forever. The backends may linger in case repl ack isn't received or sync replicas aren't reachable? Users may have to set the GUC to a 'reasonable value'. If okay, I can make the GUC behave this way - value 0 existing behaviour i.e. no wait for sync repl ack, just process query cancels and proc die interrupts immediately; value -1 wait unboundedly for the ack; value > 0 wait for specified milliseconds for the ack. Regards, Bharath Rupireddy.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Andrey Borodin
Date:
> 10 мая 2022 г., в 12:59, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> написал(а): > > If okay, I can make the GUC behave this way - value 0 existing > behaviour i.e. no wait for sync repl ack, just process query cancels > and proc die interrupts immediately; value -1 wait unboundedly for the > ack; value > 0 wait for specified milliseconds for the ack. +1 if we make -1 and 0 only valid values. > query cancels or proc die interrupts Please note, that typical HA tool would need to handle query cancels and proc die interrupts differently. When the network is partitioned and somewhere standby is promoted you definitely want infinite wait for cancels. Yet onceupon a time you want to shutdown postgres without coredump - thus proc die needs to be processed. Thanks! Best regards, Andrey Borodin.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Tue, May 10, 2022 at 5:55 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > 10 мая 2022 г., в 12:59, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> написал(а): > > > > If okay, I can make the GUC behave this way - value 0 existing > > behaviour i.e. no wait for sync repl ack, just process query cancels > > and proc die interrupts immediately; value -1 wait unboundedly for the > > ack; value > 0 wait for specified milliseconds for the ack. > +1 if we make -1 and 0 only valid values. > > > query cancels or proc die interrupts > > Please note, that typical HA tool would need to handle query cancels and proc die interrupts differently. Agree. > When the network is partitioned and somewhere standby is promoted you definitely want infinite wait for cancels. When standby is promoted, no point the old primary waiting forever for ack assuming we are going to discard it. > Yet once upon a time you want to shutdown postgres without coredump - thus proc die needs to be processed. I think users can still have the flexibility to set the right amounts of time to process cancel and proc die interrupts. IMHO, the GUC can still have 0, -1 and value > 0 in milliseconds, let the users decide on what they want. Do you see any problems with this? Thoughts? Regards, Bharath Rupireddy.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Tue, May 10, 2022 at 5:55 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > 10 мая 2022 г., в 12:59, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> написал(а): > > > > If okay, I can make the GUC behave this way - value 0 existing > > behaviour i.e. no wait for sync repl ack, just process query cancels > > and proc die interrupts immediately; value -1 wait unboundedly for the > > ack; value > 0 wait for specified milliseconds for the ack. > +1 if we make -1 and 0 only valid values. > > > query cancels or proc die interrupts > > Please note, that typical HA tool would need to handle query cancels and proc die interrupts differently. Hm, after thinking for a while, I tend to agree with the above approach - meaning, query cancel interrupt processing can completely be disabled in SyncRepWaitForLSN() and process proc die interrupt immediately, this approach requires no GUC as opposed to the proposed v1 patch upthread. However, it's good to see what other hackers think about this. > When the network is partitioned and somewhere standby is promoted you definitely want infinite wait for cancels. Yet onceupon a time you want to shutdown postgres without coredump - thus proc die needs to be processed. Agree. On Mon, May 9, 2022 at 4:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > On 26 Apr 2022, at 11:26, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > > > Is it worth adding additional complexity that is not a complete solution? > > Its not additional complexity. It is removing additional complexity that made sync rep interruptible. (But I'm surely talkingnot about GUCs like synchronous_replication_naptime_before_cancel - wait of sync rep must be indefinite until synchrous_commit\synchronous_standby_namesare satisfied ) > > And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presenceof a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timelineend is successfully replicated to synchronous_standby_names. Hm, that needs to be done anyways. How about doing as proposed initially upthread [1]? Also, quoting the idea here [2]. Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=jF2cFAyv1MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com [2] 2) Wait for sync standbys to catch up upon restart after the crash or in the next txn after the old locally committed txn was canceled. One way to achieve this is to let the backend, that's making the first connection, wait for sync standbys to catch up in ClientAuthentication right after successful authentication. However, I'm not sure this is the best way to do it at this point. Regards, Bharath Rupireddy.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Andrey Borodin
Date:
> 25 июля 2022 г., в 14:29, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> написал(а): > > Hm, after thinking for a while, I tend to agree with the above > approach - meaning, query cancel interrupt processing can completely > be disabled in SyncRepWaitForLSN() and process proc die interrupt > immediately, this approach requires no GUC as opposed to the proposed > v1 patch upthread. GUC was proposed here[0] to maintain compatibility with previous behaviour. But I think that having no GUC here is fine too.If we do not allow cancelation of unreplicated backends, of course. >> >> And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presenceof a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timelineend is successfully replicated to synchronous_standby_names. > > Hm, that needs to be done anyways. How about doing as proposed > initially upthread [1]? Also, quoting the idea here [2]. > > Thoughts? > > [1] https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=jF2cFAyv1MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com > [2] 2) Wait for sync standbys to catch up upon restart after the crash or > in the next txn after the old locally committed txn was canceled. One > way to achieve this is to let the backend, that's making the first > connection, wait for sync standbys to catch up in ClientAuthentication > right after successful authentication. However, I'm not sure this is > the best way to do it at this point. I think ideally startup process should not allow read only connections in CheckRecoveryConsistency() until WAL is not replicatedto quorum al least up until new timeline LSN. Thanks! [0] https://commitfest.postgresql.org/34/2402/
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Mon, Jul 25, 2022 at 4:20 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > 25 июля 2022 г., в 14:29, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> написал(а): > > > > Hm, after thinking for a while, I tend to agree with the above > > approach - meaning, query cancel interrupt processing can completely > > be disabled in SyncRepWaitForLSN() and process proc die interrupt > > immediately, this approach requires no GUC as opposed to the proposed > > v1 patch upthread. > GUC was proposed here[0] to maintain compatibility with previous behaviour. But I think that having no GUC here is finetoo. If we do not allow cancelation of unreplicated backends, of course. > > >> > >> And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presenceof a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timelineend is successfully replicated to synchronous_standby_names. > > > > Hm, that needs to be done anyways. How about doing as proposed > > initially upthread [1]? Also, quoting the idea here [2]. > > > > Thoughts? > > > > [1] https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=jF2cFAyv1MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com > > [2] 2) Wait for sync standbys to catch up upon restart after the crash or > > in the next txn after the old locally committed txn was canceled. One > > way to achieve this is to let the backend, that's making the first > > connection, wait for sync standbys to catch up in ClientAuthentication > > right after successful authentication. However, I'm not sure this is > > the best way to do it at this point. > > > I think ideally startup process should not allow read only connections in CheckRecoveryConsistency() until WAL is not replicatedto quorum al least up until new timeline LSN. We can't do it in CheckRecoveryConsistency() unless I'm missing something. Because, the walsenders (required for sending the remaining WAL to sync standbys to achieve quorum) can only be started after the server reaches a consistent state, after all walsenders are specialized backends. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Thu, Aug 4, 2022 at 1:42 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Jul 25, 2022 at 4:20 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > > 25 июля 2022 г., в 14:29, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> написал(а): > > > > > > Hm, after thinking for a while, I tend to agree with the above > > > approach - meaning, query cancel interrupt processing can completely > > > be disabled in SyncRepWaitForLSN() and process proc die interrupt > > > immediately, this approach requires no GUC as opposed to the proposed > > > v1 patch upthread. > > GUC was proposed here[0] to maintain compatibility with previous behaviour. But I think that having no GUC here is finetoo. If we do not allow cancelation of unreplicated backends, of course. > > > > >> > > >> And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presenceof a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timelineend is successfully replicated to synchronous_standby_names. > > > > > > Hm, that needs to be done anyways. How about doing as proposed > > > initially upthread [1]? Also, quoting the idea here [2]. > > > > > > Thoughts? > > > > > > [1] https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=jF2cFAyv1MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com > > > [2] 2) Wait for sync standbys to catch up upon restart after the crash or > > > in the next txn after the old locally committed txn was canceled. One > > > way to achieve this is to let the backend, that's making the first > > > connection, wait for sync standbys to catch up in ClientAuthentication > > > right after successful authentication. However, I'm not sure this is > > > the best way to do it at this point. > > > > > > I think ideally startup process should not allow read only connections in CheckRecoveryConsistency() until WAL is notreplicated to quorum al least up until new timeline LSN. > > We can't do it in CheckRecoveryConsistency() unless I'm missing > something. Because, the walsenders (required for sending the remaining > WAL to sync standbys to achieve quorum) can only be started after the > server reaches a consistent state, after all walsenders are > specialized backends. Continuing on the above thought (I inadvertently clicked the send button previously): A simple approach would be to check for quorum in PostgresMain() before entering the query loop for (;;) for non-walsender cases. A disadvantage of this would be that all the backends will be waiting here in the worst case if it takes time for achieving the sync quorum after restart - roughly we can do the following in PostgresMain(), of course we need locking mechanism so that all the backends whoever reaches here will wait for the same lsn: if (sync_replicaion_defined == true && shmem->wait_for_sync_repl_upon_restart == true) { SyncRepWaitForLSN(pg_current_wal_flush_lsn(), false); shmem->wait_for_sync_repl_upon_restart = false; } Thoughts? -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Kyotaro Horiguchi
Date:
At Tue, 26 Apr 2022 08:26:59 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in > While this may mitigate the problem, I don't think it will deal with > all the cases which could cause a transaction to end up committed locally, > but not on the synchronous standby. I think that only using the full > power of two-phase commit can make this bulletproof. > > Is it worth adding additional complexity that is not a complete solution? I would agree to this. Likewise 2PC, whatever we do to make it perfect, we're left with unresolvable problems at least for now. Doesn't it meet your requirements if we have a means to know the last transaction on the current session is locally committed or aborted? We are already internally managing last committed LSN. I think we can do the same thing about transaction abort and last inserted LSN and we can expose them any way. This is way simpler than the (maybe) uncompletable attempt to fill up the deep gap. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Fri, Aug 5, 2022 at 8:19 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 26 Apr 2022 08:26:59 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in > > While this may mitigate the problem, I don't think it will deal with > > all the cases which could cause a transaction to end up committed locally, > > but not on the synchronous standby. I think that only using the full > > power of two-phase commit can make this bulletproof. > > > > Is it worth adding additional complexity that is not a complete solution? > > I would agree to this. Likewise 2PC, whatever we do to make it > perfect, we're left with unresolvable problems at least for now. > > Doesn't it meet your requirements if we have a means to know the last > transaction on the current session is locally committed or aborted? > > We are already internally managing last committed LSN. I think we can > do the same thing about transaction abort and last inserted LSN and we > can expose them any way. This is way simpler than the (maybe) > uncompletable attempt to fill up the deep gap. There can be more txns that are locally-committed-but-not-yet-replicated. Even if we have that information stored somewhere, what do we do with it? Those txns are committed from the client perspective but not committed from the server's perspective. Can you please explain more about your idea, I may be missing something? -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Kyotaro Horiguchi
Date:
At Mon, 8 Aug 2022 19:13:25 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Fri, Aug 5, 2022 at 8:19 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Tue, 26 Apr 2022 08:26:59 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in > > > While this may mitigate the problem, I don't think it will deal with > > > all the cases which could cause a transaction to end up committed locally, > > > but not on the synchronous standby. I think that only using the full > > > power of two-phase commit can make this bulletproof. > > > > > > Is it worth adding additional complexity that is not a complete solution? > > > > I would agree to this. Likewise 2PC, whatever we do to make it > > perfect, we're left with unresolvable problems at least for now. > > > > Doesn't it meet your requirements if we have a means to know the last > > transaction on the current session is locally committed or aborted? > > > > We are already internally managing last committed LSN. I think we can > > do the same thing about transaction abort and last inserted LSN and we > > can expose them any way. This is way simpler than the (maybe) > > uncompletable attempt to fill up the deep gap. > > There can be more txns that are > locally-committed-but-not-yet-replicated. Even if we have that > information stored somewhere, what do we do with it? Those txns are > committed from the client perspective but not committed from the > server's perspective. > > Can you please explain more about your idea, I may be missing something? (I'm not sure I understand the requirements here..) I understand that it is about query cancellation. In the case of primary crash/termination, client cannot even know whether the commit of the ongoing transaction, if any, has been recorded. Anyway no way other than to somehow confirm that the change by the transaction has been actually made after restart. I believe it is the standard practice of the applications that work on HA clusters. The same is true in the case of query cancellation since commit response doesn't reach the client, too. But even in this case if we had functions/views that tells us the last-committed/last-aborted/last-inserted LSN on a session, we can know whether the last transaction has been committed along with the commit LSN maybe more easily. # In fact, I see those functions rather as a means to know whether a # change by the last transaction on a session is available on some # replica. For example, the below heavily simplified pseudo code might display how the fucntions (if they were functions) work. try { s.execute("INSERT .."); c.commit(); } catch (Exception x) { c.commit(); if (s.execute("SELECT pg_session_last_committed_lsn() = " "pg_session_last_inserted_lsn()")) { /* the transaction has been locally committed */ if (s.execute("SELECT replay_lsn >= pg_session_last_committed_lsn() " "FROM pg_stat_replication where xxxx") /* the commit has been replicated to xxx, LSN is known */ } else { /* the transaction has not been locally committed */ <retry?> } } regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Tue, Aug 9, 2022 at 12:42 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > Can you please explain more about your idea, I may be missing something? > > (I'm not sure I understand the requirements here..) I've explained the problem with the current HA setup with synchronous replication upthread at [1]. Let me reiterate it here once again. When a query is cancelled (a simple stroke of CTRL+C or pg_cancel_backend() call) while the txn is waiting for ack in SyncRepWaitForLSN(); for the client, the txn is actually committed (locally-committed-but-not-yet-replicated to all of sync standbys) like any other txn, a warning is emitted into server logs but it is of no use for the client (think of client as applications). There can be many such txns waiting for ack in SyncRepWaitForLSN() and query cancel can be issued on all of those sessions. The problem is that the subsequent reads will then be able to read all of those locally-committed-but-not-yet-replicated to all of sync standbys txns data - this is what I call an inconsistency (can we call this a read-after-write inconsistency?) because of lack of proper query cancel handling. And if the sync standbys are down or unable to come up for some reason, until then, the primary will be serving clients with the inconsistent data. BTW, I found a report of this problem here [2]. The solution proposed for the above problem is to just 'not honor query cancels at all while waiting for ack in SyncRepWaitForLSN()'. When a proc die is pending, then also, there can be locally-committed-but-not-yet-replicated to all of sync standbys txns. Typically, there are two choices for the clients 1) reuse the primary instance after restart 2) failover to one of sync standbys. For case (1), there might be read-after-write inconsistency as explained above. For case (2), those txns might get lost completely if the failover target sync standby or the new primary didn't receive them and the other sync standbys that have received them are now ahead and need a special treatment (run pg_rewind) for them to be able to connect to new primary. The solution proposed for case (1) of the above problem is to 'process the ProcDiePending immediately and upon restart the first backend can wait until the sync standbys are caught up to ensure no inconsistent reads'. The solution proposed for case (2) of the above problem is to 'either run pg_rewind for the sync standbys that are ahead or use the idea proposed at [3]'. I hope the above explanation helps. [1] https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%40mail.gmail.com [2] https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication [3] https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gmail.com -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Tue, Aug 9, 2022 at 2:16 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > I've explained the problem with the current HA setup with synchronous > replication upthread at [1]. Let me reiterate it here once again. > > When a query is cancelled (a simple stroke of CTRL+C or > pg_cancel_backend() call) while the txn is waiting for ack in > SyncRepWaitForLSN(); for the client, the txn is actually committed > (locally-committed-but-not-yet-replicated to all of sync standbys) > like any other txn, a warning is emitted into server logs but it is of > no use for the client (think of client as applications). There can be > many such txns waiting for ack in SyncRepWaitForLSN() and query cancel > can be issued on all of those sessions. The problem is that the > subsequent reads will then be able to read all of those > locally-committed-but-not-yet-replicated to all of sync standbys txns > data - this is what I call an inconsistency (can we call this a > read-after-write inconsistency?) because of lack of proper query > cancel handling. And if the sync standbys are down or unable to come > up for some reason, until then, the primary will be serving clients > with the inconsistent data. BTW, I found a report of this problem here > [2]. > > The solution proposed for the above problem is to just 'not honor > query cancels at all while waiting for ack in SyncRepWaitForLSN()'. > > When a proc die is pending, then also, there can be > locally-committed-but-not-yet-replicated to all of sync standbys txns. > Typically, there are two choices for the clients 1) reuse the primary > instance after restart 2) failover to one of sync standbys. For case > (1), there might be read-after-write inconsistency as explained above. > For case (2), those txns might get lost completely if the failover > target sync standby or the new primary didn't receive them and the > other sync standbys that have received them are now ahead and need a > special treatment (run pg_rewind) for them to be able to connect to > new primary. > > The solution proposed for case (1) of the above problem is to 'process > the ProcDiePending immediately and upon restart the first backend can > wait until the sync standbys are caught up to ensure no inconsistent > reads'. > The solution proposed for case (2) of the above problem is to 'either > run pg_rewind for the sync standbys that are ahead or use the idea > proposed at [3]'. > > I hope the above explanation helps. > > [1] https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%40mail.gmail.com > [2] https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication > [3] https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gmail.com I'm attaching the v2 patch rebased on the latest HEAD. Please note that there are still some open points, I'm yet to find time to think more about them. Meanwhile, I'm posting the v2 patch for making cfbot happy. Any further thoughts on the overall design of the patch are most welcome. Thanks. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bruce Momjian
Date:
On Tue, Sep 27, 2022 at 06:52:21PM +0530, Bharath Rupireddy wrote: > On Tue, Aug 9, 2022 at 2:16 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > I've explained the problem with the current HA setup with synchronous > > replication upthread at [1]. Let me reiterate it here once again. > > > > When a query is cancelled (a simple stroke of CTRL+C or > > pg_cancel_backend() call) while the txn is waiting for ack in > > SyncRepWaitForLSN(); for the client, the txn is actually committed > > (locally-committed-but-not-yet-replicated to all of sync standbys) > > like any other txn, a warning is emitted into server logs but it is of > > no use for the client (think of client as applications). There can be > > many such txns waiting for ack in SyncRepWaitForLSN() and query cancel > > can be issued on all of those sessions. The problem is that the > > subsequent reads will then be able to read all of those > > locally-committed-but-not-yet-replicated to all of sync standbys txns > > data - this is what I call an inconsistency (can we call this a > > read-after-write inconsistency?) because of lack of proper query > > cancel handling. And if the sync standbys are down or unable to come > > up for some reason, until then, the primary will be serving clients > > with the inconsistent data. BTW, I found a report of this problem here > > [2]. > > > > The solution proposed for the above problem is to just 'not honor > > query cancels at all while waiting for ack in SyncRepWaitForLSN()'. > > > > When a proc die is pending, then also, there can be > > locally-committed-but-not-yet-replicated to all of sync standbys txns. > > Typically, there are two choices for the clients 1) reuse the primary > > instance after restart 2) failover to one of sync standbys. For case > > (1), there might be read-after-write inconsistency as explained above. > > For case (2), those txns might get lost completely if the failover > > target sync standby or the new primary didn't receive them and the > > other sync standbys that have received them are now ahead and need a > > special treatment (run pg_rewind) for them to be able to connect to > > new primary. > > > > The solution proposed for case (1) of the above problem is to 'process > > the ProcDiePending immediately and upon restart the first backend can > > wait until the sync standbys are caught up to ensure no inconsistent > > reads'. > > The solution proposed for case (2) of the above problem is to 'either > > run pg_rewind for the sync standbys that are ahead or use the idea > > proposed at [3]'. > > > > I hope the above explanation helps. > > > > [1] https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%40mail.gmail.com > > [2] https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication > > [3] https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gmail.com > > I'm attaching the v2 patch rebased on the latest HEAD. Please note > that there are still some open points, I'm yet to find time to think > more about them. Meanwhile, I'm posting the v2 patch for making cfbot > happy. Any further thoughts on the overall design of the patch are > most welcome. Thanks. ... > In PostgreSQL high availability setup with synchronous replication, > typically all the transactions first locally get committed, then > streamed to the synchronous standbys and the backends that generated > the transaction will wait for acknowledgement from synchronous > standbys. While waiting for acknowledgement, it may happen that the > query or the transaction gets canceled or the backend that's waiting > for acknowledgement is asked to exit. In either of these cases, the > wait for acknowledgement gets canceled and leaves transaction in an > inconsistent state as it got committed locally but not on the > standbys. When set the GUC synchronous_replication_naptime_before_cancel > introduced in this patch, it will let the backends wait for the > acknowledgement before canceling the wait for acknowledgement so > that the transaction can be available in synchronous standbys as > well. I don't think this patch is going in the right direction, and I think we need to step back to see why. First, let's see how synchronous replication works. Each backend waits for one or more synchronous replicas to acknowledge the WAL related to its commit and then it marks the commit as done in PGPROC and then to the client; I wrote a blog about it: https://momjian.us/main/blogs/pgblog/2020.html#June_3_2020 So, what happens when an insufficient number of synchronous replicas reply? Sessions hang because the synchronous behavior cannot be guaranteed. We then _allow_ query cancel so the user or administrator can get out of the hung sessions and perhaps modify synchronous_standby_names. What the proposed patch effectively does is to prevent the ability to recovery the hung sessions or auto-continue the sessions if an insufficient number of synchronous replicas respond. So, in a way, it is allowing for more strict and less strict behavior of synchronous_standby_names. However, I think trying to solve this at the session level is the wrong approach. If you set a timeout to continue stuck sessions, odds are the timeout will be too long for each session and performance will be unacceptable anyway, so you haven't gained much. If you prevent cancels, you effectively lock up the system with fewer options of recovery. I have always felt this has to be done at the server level, meaning when a synchronous_standby_names replica is not responding after a certain timeout, the administrator must be notified by calling a shell command defined in a GUC and all sessions will ignore the replica. This gives a much more predictable and useful behavior than the one in the patch --- we have discussed this approach many times on the email lists. Once we have that, we can consider removing the cancel ability while waiting for synchronous replicas (since we have the timeout) or make it optional. We can also consider how do notify the administrator during query cancel (if we allow it), backend abrupt exit/crash, and if we should allow users to specify a retry interval to resynchronize the synchronous replicas. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Fri, Sep 30, 2022 at 4:23 AM Bruce Momjian <bruce@momjian.us> wrote: > > I don't think this patch is going in the right direction, and I think we > need to step back to see why. Thanks a lot Bruce for providing insights. > First, let's see how synchronous replication works. Each backend waits > for one or more synchronous replicas to acknowledge the WAL related to > its commit and then it marks the commit as done in PGPROC and then to > the client; I wrote a blog about it: > > https://momjian.us/main/blogs/pgblog/2020.html#June_3_2020 Great! > So, what happens when an insufficient number of synchronous replicas > reply? Sessions hang because the synchronous behavior cannot be > guaranteed. We then _allow_ query cancel so the user or administrator > can get out of the hung sessions and perhaps modify > synchronous_standby_names. Right. > What the proposed patch effectively does is to prevent the ability to > recovery the hung sessions or auto-continue the sessions if an > insufficient number of synchronous replicas respond. So, in a way, it > is allowing for more strict and less strict behavior of > synchronous_standby_names. Yes. I do agree that it makes the sessions further wait and closes the quick exit path that admins/users have when the problem occurs. But it disallows users cancelling queries or terminating backends only to end up in locally-committed-but-not-replicated transactions on a server setup where sync standbys all are working fine. I agree that we don't want to further wait on cancels or proc dies as it makes the system more unresponsive [1]. > However, I think trying to solve this at the session level is the wrong > approach. If you set a timeout to continue stuck sessions, odds are the > timeout will be too long for each session and performance will be > unacceptable anyway, so you haven't gained much. If you prevent > cancels, you effectively lock up the system with fewer options of > recovery. Yes. > I have always felt this has to be done at the server level, meaning when > a synchronous_standby_names replica is not responding after a certain > timeout, the administrator must be notified by calling a shell command > defined in a GUC and all sessions will ignore the replica. This gives a > much more predictable and useful behavior than the one in the patch --- > we have discussed this approach many times on the email lists. IIUC, each walsender serving a sync standby will determine that the sync standby isn't responding for a configurable amount of time (less than wal_sender_timeout) and calls shell command to notify the admin if there are any backends waiting for sync replication in SyncRepWaitForLSN(). The shell command then provides the unresponsive sync standby name at the bare minimum for the admin to ignore it as sync standby/remove it from synchronous_standby_names to continue further. This still requires manual intervention which is a problem if running postgres server instances at scale. Also, having a new shell command in any form may pose security risks. I'm not sure at this point how this new timeout is going to work alongside wal_sender_timeout. I'm thinking about the possible options that an admin has to get out of this situation: 1) Removing the standby from synchronous_standby_names. 2) Fixing the sync standby, by restarting or restoring the lost part (such as network or some other). (1) is something that postgres can help admins get out of the problem easily and automatically without any intervention. (2) is something postgres can't do much about. How about we let postgres automatically remove an unresponsive (for a pre-configured time) sync standby from synchronous_standby_names and inform the user (via log message and via new walsender property and pg_stat_replication for monitoring purposes)? The users can then detect such standbys and later try to bring them back to the sync standbys group or do other things. I believe that a production level postgres HA with sync standbys will have monitoring to detect the replication lag, failover decision etc via monitoring pg_stat_replication. With this approach, a bit more monitoring is needed. This solution requires less or no manual intervention and scales well. Please note that I haven't studied the possibilities of implementing it yet. Thoughts? > Once we have that, we can consider removing the cancel ability while > waiting for synchronous replicas (since we have the timeout) or make it > optional. We can also consider how do notify the administrator during > query cancel (if we allow it), backend abrupt exit/crash, and Yeah. If we have the timeout-and-auto-removal-of-standby-from-sync-standbys-list solution, the users can then choose to disable processing query cancels/proc dies while waiting for sync replication in SyncRepWaitForLSN(). > if we > should allow users to specify a retry interval to resynchronize the > synchronous replicas. This is another interesting thing to consider if we were to make the auto-removed (by the above approach) standby a sync standby again without manual intervention. Thoughts? [1] https://www.postgresql.org/message-id/CA%2BTgmoaCBwgMDkeBDOgtPgHcbfSYq%2BzORjL5DoU3pJyjALxtoQ%40mail.gmail.com -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bruce Momjian
Date:
On Sat, Oct 1, 2022 at 06:59:26AM +0530, Bharath Rupireddy wrote: > > I have always felt this has to be done at the server level, meaning when > > a synchronous_standby_names replica is not responding after a certain > > timeout, the administrator must be notified by calling a shell command > > defined in a GUC and all sessions will ignore the replica. This gives a ------------------------------------ > > much more predictable and useful behavior than the one in the patch --- > > we have discussed this approach many times on the email lists. > > IIUC, each walsender serving a sync standby will determine that the > sync standby isn't responding for a configurable amount of time (less > than wal_sender_timeout) and calls shell command to notify the admin > if there are any backends waiting for sync replication in > SyncRepWaitForLSN(). The shell command then provides the unresponsive > sync standby name at the bare minimum for the admin to ignore it as > sync standby/remove it from synchronous_standby_names to continue > further. This still requires manual intervention which is a problem if > running postgres server instances at scale. Also, having a new shell As I highlighted above, by default you notify the administrator that a sychronous replica is not responding and then ignore it. If it becomes responsive again, you notify the administrator again and add it back as a sychronous replica. > command in any form may pose security risks. I'm not sure at this > point how this new timeout is going to work alongside > wal_sender_timeout. We have archive_command, so I don't see a problem with another shell command. > I'm thinking about the possible options that an admin has to get out > of this situation: > 1) Removing the standby from synchronous_standby_names. Yes, see above. We might need a read-only GUC that reports which sychronous replicas are active. As you can see, there is a lot of API design required here, but this is the most effective approach. > 2) Fixing the sync standby, by restarting or restoring the lost part > (such as network or some other). > > (1) is something that postgres can help admins get out of the problem > easily and automatically without any intervention. (2) is something > postgres can't do much about. > > How about we let postgres automatically remove an unresponsive (for a > pre-configured time) sync standby from synchronous_standby_names and > inform the user (via log message and via new walsender property and > pg_stat_replication for monitoring purposes)? The users can then > detect such standbys and later try to bring them back to the sync > standbys group or do other things. I believe that a production level > postgres HA with sync standbys will have monitoring to detect the > replication lag, failover decision etc via monitoring > pg_stat_replication. With this approach, a bit more monitoring is > needed. This solution requires less or no manual intervention and > scales well. Please note that I haven't studied the possibilities of > implementing it yet. > > Thoughts? Yes, see above. > > Once we have that, we can consider removing the cancel ability while > > waiting for synchronous replicas (since we have the timeout) or make it > > optional. We can also consider how do notify the administrator during > > query cancel (if we allow it), backend abrupt exit/crash, and > > Yeah. If we have the > timeout-and-auto-removal-of-standby-from-sync-standbys-list solution, > the users can then choose to disable processing query cancels/proc > dies while waiting for sync replication in SyncRepWaitForLSN(). Yes. We might also change things so a query cancel that happens during sychronous replica waiting can only be done by an administrator, not the session owner. Again, lots of design needed here. > > if we > > should allow users to specify a retry interval to resynchronize the > > synchronous replicas. > > This is another interesting thing to consider if we were to make the > auto-removed (by the above approach) standby a sync standby again > without manual intervention. Yes, see above. You are addressing the right questions here. :-) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Thu, Oct 6, 2022 at 2:30 AM Bruce Momjian <bruce@momjian.us> wrote: > > As I highlighted above, by default you notify the administrator that a > sychronous replica is not responding and then ignore it. If it becomes > responsive again, you notify the administrator again and add it back as > a sychronous replica. > > > command in any form may pose security risks. I'm not sure at this > > point how this new timeout is going to work alongside > > wal_sender_timeout. > > We have archive_command, so I don't see a problem with another shell > command. Why do we need a new command to inform the admin/user about a sync replication being ignored (from sync quorum) for not responding or acknowledging for a certain amount of time in SyncRepWaitForLSN()? Can't we just add an extra column or use existing sync_state in pg_stat_replication()? We can either introduce a new state such as temporary_async or just use the existing state 'potential' [1]. A problem is that the server has to be monitored for this extra, new state. If we do this, we don't need another command to report. > > I'm thinking about the possible options that an admin has to get out > > of this situation: > > 1) Removing the standby from synchronous_standby_names. > > Yes, see above. We might need a read-only GUC that reports which > sychronous replicas are active. As you can see, there is a lot of API > design required here, but this is the most effective approach. If we use the above approach to report via pg_stat_replication(), we don't need this. > > > Once we have that, we can consider removing the cancel ability while > > > waiting for synchronous replicas (since we have the timeout) or make it > > > optional. We can also consider how do notify the administrator during > > > query cancel (if we allow it), backend abrupt exit/crash, and > > > > Yeah. If we have the > > timeout-and-auto-removal-of-standby-from-sync-standbys-list solution, > > the users can then choose to disable processing query cancels/proc > > dies while waiting for sync replication in SyncRepWaitForLSN(). > > Yes. We might also change things so a query cancel that happens during > sychronous replica waiting can only be done by an administrator, not the > session owner. Again, lots of design needed here. Yes, we need infrastructure to track who issued the query cancel or proc die and so on. IMO, it's not a good way to allow/disallow query cancels or CTRL+C based on role types - superusers or users with replication roles or users who are members of any of predefined roles. In general, it is the walsender serving sync standby that has to mark itself as async standby by removing itself from synchronous_standby_names, reloading config variables and waking up the backends that are waiting in syncrep wait queue for it to update LSN. And, the new auto removal timeout should always be set to less than wal_sender_timeout. All that said, imagine we have timeout-and-auto-removal-of-standby-from-sync-standbys-list solution in one or the other forms with auto removal timeout set to 5 minutes, any of following can happen: 1) query is stuck waiting for sync standby ack in SyncRepWaitForLSN(), no query cancel or proc die interrupt is arrived, the sync standby is made as async standy after the timeout i.e. 5 minutes. 2) query is stuck waiting for sync standby ack in SyncRepWaitForLSN(), say for about 3 minutes, then query cancel or proc die interrupt is arrived, should we immediately process it or wait for timeout to happen (2 more minutes) and then process the interrupt? If we immediately process the interrupts, then the locally-committed-but-not-replicated-to-sync-standby problems described upthread [2] are left unresolved. [1] https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-VIEW sync_state text Synchronous state of this standby server. Possible values are: async: This standby server is asynchronous. potential: This standby server is now asynchronous, but can potentially become synchronous if one of current synchronous ones fails. sync: This standby server is synchronous. quorum: This standby server is considered as a candidate for quorum standbys. [2] https://www.postgresql.org/message-id/CALj2ACXmMWtpmuT-%3Dv8F%2BLk4QCbdkeN%2ByHKXeRGKFfjG96YbKA%40mail.gmail.com -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bruce Momjian
Date:
On Thu, Oct 6, 2022 at 01:33:33PM +0530, Bharath Rupireddy wrote: > On Thu, Oct 6, 2022 at 2:30 AM Bruce Momjian <bruce@momjian.us> wrote: > > > > As I highlighted above, by default you notify the administrator that a > > sychronous replica is not responding and then ignore it. If it becomes > > responsive again, you notify the administrator again and add it back as > > a sychronous replica. > > > > > command in any form may pose security risks. I'm not sure at this > > > point how this new timeout is going to work alongside > > > wal_sender_timeout. > > > > We have archive_command, so I don't see a problem with another shell > > command. > > Why do we need a new command to inform the admin/user about a sync > replication being ignored (from sync quorum) for not responding or > acknowledging for a certain amount of time in SyncRepWaitForLSN()? > Can't we just add an extra column or use existing sync_state in > pg_stat_replication()? We can either introduce a new state such as > temporary_async or just use the existing state 'potential' [1]. A > problem is that the server has to be monitored for this extra, new > state. If we do this, we don't need another command to report. Yes, that is a good point. I assumed people would want notification immediately rather than waiting for monitoring to notice it. Consider if you monitor every five seconds but the primary loses sync and goes down during that five-second interval --- there would be no way to know if sync stopped and reported committed transactions to the client before the primary went down. I would love to just rely on monitoring but I am not sure that is sufficient for this use-case. Of course, if email is being sent it might be still in the email queue when the primary goes down, but I guess if I was doing it I would make sure the email was delivered _before_ returning. The point is that we would not disable the sync and acknowledge the commit to the client until the notification command returns success --- that kind of guarantee is hard to do with monitoring. These are good discussions to have --- maybe I am wrong. > > > > Once we have that, we can consider removing the cancel ability while > > > > waiting for synchronous replicas (since we have the timeout) or make it > > > > optional. We can also consider how do notify the administrator during > > > > query cancel (if we allow it), backend abrupt exit/crash, and > > > > > > Yeah. If we have the > > > timeout-and-auto-removal-of-standby-from-sync-standbys-list solution, > > > the users can then choose to disable processing query cancels/proc > > > dies while waiting for sync replication in SyncRepWaitForLSN(). > > > > Yes. We might also change things so a query cancel that happens during > > sychronous replica waiting can only be done by an administrator, not the > > session owner. Again, lots of design needed here. > > Yes, we need infrastructure to track who issued the query cancel or > proc die and so on. IMO, it's not a good way to allow/disallow query > cancels or CTRL+C based on role types - superusers or users with > replication roles or users who are members of any of predefined roles. > > In general, it is the walsender serving sync standby that has to mark > itself as async standby by removing itself from > synchronous_standby_names, reloading config variables and waking up > the backends that are waiting in syncrep wait queue for it to update > LSN. > > And, the new auto removal timeout should always be set to less than > wal_sender_timeout. > > All that said, imagine we have > timeout-and-auto-removal-of-standby-from-sync-standbys-list solution > in one or the other forms with auto removal timeout set to 5 minutes, > any of following can happen: > > 1) query is stuck waiting for sync standby ack in SyncRepWaitForLSN(), > no query cancel or proc die interrupt is arrived, the sync standby is > made as async standy after the timeout i.e. 5 minutes. > 2) query is stuck waiting for sync standby ack in SyncRepWaitForLSN(), > say for about 3 minutes, then query cancel or proc die interrupt is > arrived, should we immediately process it or wait for timeout to > happen (2 more minutes) and then process the interrupt? If we > immediately process the interrupts, then the > locally-committed-but-not-replicated-to-sync-standby problems > described upthread [2] are left unresolved. I have a feeling once we have the timeout, we would disable query cancel when we are in this stage since it is canceling a committed query. The timeout would cancel the sync but at least the administrator would know. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Andrey Borodin
Date:
On Thu, Sep 29, 2022 at 3:53 PM Bruce Momjian <bruce@momjian.us> wrote: > > So, what happens when an insufficient number of synchronous replicas > reply? It's a failover. > Sessions hang because the synchronous behavior cannot be > guaranteed. We then _allow_ query cancel so the user or administrator > can get out of the hung sessions and perhaps modify > synchronous_standby_names. Administrators should not modify synchronous_standby_names. Administrator must shoot this not in the head. > I have always felt this has to be done at the server level, meaning when > a synchronous_standby_names replica is not responding after a certain > timeout, the administrator must be notified by calling a shell command > defined in a GUC and all sessions will ignore the replica. Standbys are expelled from the waitlist according to quorum rules. I'd propose not to invent more quorum rules involving shell scripts. The Administrator expressed what number of standbys can be offline by setting synchronous_standby_names. They actively asked for hanging queries in case of insufficient standbys. We have reserved administrator connections for the case when all connection slots are used by hanging queries. Best regards, Andrey Borodin.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Andrey Borodin
Date:
On Tue, Nov 8, 2022 at 9:06 PM Andrey Borodin <amborodin86@gmail.com> wrote: > > On Thu, Sep 29, 2022 at 3:53 PM Bruce Momjian <bruce@momjian.us> wrote: > > > > So, what happens when an insufficient number of synchronous replicas > > reply? > > It's a failover. > > > Sessions hang because the synchronous behavior cannot be > > guaranteed. We then _allow_ query cancel so the user or administrator > > can get out of the hung sessions and perhaps modify > > synchronous_standby_names. > > Administrators should not modify synchronous_standby_names. > Administrator must shoot this not in the head. > Some funny stuff. If a user tries to cancel a non-replicated transaction Azure Postgres will answer: "user requested cancel while waiting for synchronous replication ack. The COMMIT record has already flushed to WAL locally and might not have been replicatead to the standby. We must wait here." AWS RDS will answer: "ignoring request to cancel wait for synchronous replication" Yandex Managed Postgres will answer: "canceling wait for synchronous replication due requested, but cancelation is not allowed. The transaction has already committed locally and might not have been replicated to the standby. We must wait here." So, for many services providing Postgres as a service it's only a matter of wording. Best regards, Andrey Borodin.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Mon, Nov 28, 2022 at 12:57 AM Andrey Borodin <amborodin86@gmail.com> wrote: > > Some funny stuff. If a user tries to cancel a non-replicated transaction > Azure Postgres will answer: "user requested cancel while waiting for > synchronous replication ack. The COMMIT record has already flushed to > WAL locally and might not have been replicatead to the standby. We > must wait here." > AWS RDS will answer: "ignoring request to cancel wait for synchronous > replication" > Yandex Managed Postgres will answer: "canceling wait for synchronous > replication due requested, but cancelation is not allowed. The > transaction has already committed locally and might not have been > replicated to the standby. We must wait here." > > So, for many services providing Postgres as a service it's only a > matter of wording. Thanks for verifying the behaviour. And many thanks for an off-list chat. FWIW, I'm planning to prepare a patch as per the below idea which is something similar to the initial proposal in this thread. Meanwhile, thoughts are welcome. 1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for sync replication acknowledgement. 2. Process proc die immediately when a backend is waiting for sync replication acknowledgement, as it does today, however, upon restart, don't open up for business (don't accept ready-only connections) unless the sync standbys have caught up. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bruce Momjian
Date:
On Sun, Nov 27, 2022 at 11:26:50AM -0800, Andrey Borodin wrote: > Some funny stuff. If a user tries to cancel a non-replicated transaction > Azure Postgres will answer: "user requested cancel while waiting for > synchronous replication ack. The COMMIT record has already flushed to > WAL locally and might not have been replicatead to the standby. We > must wait here." > AWS RDS will answer: "ignoring request to cancel wait for synchronous > replication" > Yandex Managed Postgres will answer: "canceling wait for synchronous > replication due requested, but cancelation is not allowed. The > transaction has already committed locally and might not have been > replicated to the standby. We must wait here." > > So, for many services providing Postgres as a service it's only a > matter of wording. Wow, you are telling me all three cloud vendors changed how query cancel behaves on an unresponsive synchronous replica? That is certainly a testament that the community needs to change or at least review our behavior. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bruce Momjian
Date:
On Mon, Nov 28, 2022 at 12:03:06PM +0530, Bharath Rupireddy wrote: > Thanks for verifying the behaviour. And many thanks for an off-list chat. > > FWIW, I'm planning to prepare a patch as per the below idea which is > something similar to the initial proposal in this thread. Meanwhile, > thoughts are welcome. > > 1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for > sync replication acknowledgement. > 2. Process proc die immediately when a backend is waiting for sync > replication acknowledgement, as it does today, however, upon restart, > don't open up for business (don't accept ready-only connections) > unless the sync standbys have caught up. You can prepare a patch, but it unlikely to get much interest until you get agreement on what the behavior should be. The optimal order of developer actions is: Desirability -> Design -> Implement -> Test -> Review -> Commit https://wiki.postgresql.org/wiki/Todo#Development_Process Telling us what other cloud vendors do is not sufficient. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Andrey Borodin
Date:
On Mon, Nov 28, 2022 at 12:59 PM Bruce Momjian <bruce@momjian.us> wrote: > > You can prepare a patch, but it unlikely to get much interest until you > get agreement on what the behavior should be. We discussed the approach on 2020's Unconference [0]. And there kind of was an agreement. Then I made a presentation on FOSDEM with all the details [1]. The patch had been on commitfest since 2019 [2]. There were reviewers in the CF entry, and we kind of had an agreement. Jeff Davis proposed a similar patch [3]. And we certainly agreed about cancels. And now Bharath is proposing the same. We have the interest and agreement. Best regards, Andrey Borodin. [0] https://wiki.postgresql.org/wiki/PgCon_2020_Developer_Unconference/Edge_cases_of_synchronous_replication_in_HA_solutions [1] https://archive.fosdem.org/2021/schedule/event/postgresql_caveats_of_replication/attachments/slides/4365/export/events/attachments/postgresql_caveats_of_replication/slides/4365/sides.pdf [2] https://commitfest.postgresql.org/26/2402/ [3] https://www.postgresql.org/message-id/flat/6a052e81060824a8286148b1165bafedbd7c86cd.camel%40j-davis.com#415dc2f7d41b8a251b419256407bb64d
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bruce Momjian
Date:
On Mon, Nov 28, 2022 at 01:31:39PM -0800, Andrey Borodin wrote: > On Mon, Nov 28, 2022 at 12:59 PM Bruce Momjian <bruce@momjian.us> wrote: > > > > You can prepare a patch, but it unlikely to get much interest until you > > get agreement on what the behavior should be. > > We discussed the approach on 2020's Unconference [0]. And there kind > of was an agreement. > Then I made a presentation on FOSDEM with all the details [1]. > The patch had been on commitfest since 2019 [2]. There were reviewers > in the CF entry, and we kind of had an agreement. > Jeff Davis proposed a similar patch [3]. And we certainly agreed about cancels. > And now Bharath is proposing the same. > > We have the interest and agreement. Okay, I was not aware we had such broad agreement. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
SATYANARAYANA NARLAPURAM
Date:
On Sun, Nov 27, 2022 at 10:33 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Nov 28, 2022 at 12:57 AM Andrey Borodin <amborodin86@gmail.com> wrote:
>
> Some funny stuff. If a user tries to cancel a non-replicated transaction
> Azure Postgres will answer: "user requested cancel while waiting for
> synchronous replication ack. The COMMIT record has already flushed to
> WAL locally and might not have been replicatead to the standby. We
> must wait here."
> AWS RDS will answer: "ignoring request to cancel wait for synchronous
> replication"
> Yandex Managed Postgres will answer: "canceling wait for synchronous
> replication due requested, but cancelation is not allowed. The
> transaction has already committed locally and might not have been
> replicated to the standby. We must wait here."
>
> So, for many services providing Postgres as a service it's only a
> matter of wording.
Thanks for verifying the behaviour. And many thanks for an off-list chat.
FWIW, I'm planning to prepare a patch as per the below idea which is
something similar to the initial proposal in this thread. Meanwhile,
thoughts are welcome.
1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for
sync replication acknowledgement.
+1
2. Process proc die immediately when a backend is waiting for sync
replication acknowledgement, as it does today, however, upon restart,
don't open up for business (don't accept ready-only connections)
unless the sync standbys have caught up.
Are you planning to block connections or queries to the database? It would be good to allow connections and let them query the monitoring views but block the queries until sync standby have caught up. Otherwise, this leaves a monitoring hole. In cloud, I presume superusers are allowed to connect and monitor (end customers are not the role members and can't query the data). The same can't be true for all the installations. Could you please add more details on your approach?
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bruce Momjian
Date:
On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote: > 2. Process proc die immediately when a backend is waiting for sync > replication acknowledgement, as it does today, however, upon restart, > don't open up for business (don't accept ready-only connections) > unless the sync standbys have caught up. > > > Are you planning to block connections or queries to the database? It would be > good to allow connections and let them query the monitoring views but block the > queries until sync standby have caught up. Otherwise, this leaves a monitoring > hole. In cloud, I presume superusers are allowed to connect and monitor (end > customers are not the role members and can't query the data). The same can't be > true for all the installations. Could you please add more details on your > approach? I think ALTER SYSTEM should be allowed, particularly so you can modify synchronous_standby_names, no? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
SATYANARAYANA NARLAPURAM
Date:
On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> 2. Process proc die immediately when a backend is waiting for sync
> replication acknowledgement, as it does today, however, upon restart,
> don't open up for business (don't accept ready-only connections)
> unless the sync standbys have caught up.
>
>
> Are you planning to block connections or queries to the database? It would be
> good to allow connections and let them query the monitoring views but block the
> queries until sync standby have caught up. Otherwise, this leaves a monitoring
> hole. In cloud, I presume superusers are allowed to connect and monitor (end
> customers are not the role members and can't query the data). The same can't be
> true for all the installations. Could you please add more details on your
> approach?
I think ALTER SYSTEM should be allowed, particularly so you can modify
synchronous_standby_names, no?
Yes, Change in synchronous_standby_names is expected in this situation. IMHO, blocking all the connections is not a recommended approach.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
SATYANARAYANA NARLAPURAM
Date:
On Tue, Nov 29, 2022 at 8:42 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:
On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <bruce@momjian.us> wrote:On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> 2. Process proc die immediately when a backend is waiting for sync
> replication acknowledgement, as it does today, however, upon restart,
> don't open up for business (don't accept ready-only connections)
> unless the sync standbys have caught up.
>
>
> Are you planning to block connections or queries to the database? It would be
> good to allow connections and let them query the monitoring views but block the
> queries until sync standby have caught up. Otherwise, this leaves a monitoring
> hole. In cloud, I presume superusers are allowed to connect and monitor (end
> customers are not the role members and can't query the data). The same can't be
> true for all the installations. Could you please add more details on your
> approach?
I think ALTER SYSTEM should be allowed, particularly so you can modify
synchronous_standby_names, no?Yes, Change in synchronous_standby_names is expected in this situation. IMHO, blocking all the connections is not a recommended approach.
How about allowing superusers (they can still read locally committed data) and users part of pg_monitor role?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Andrey Borodin
Date:
On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote: > > 2. Process proc die immediately when a backend is waiting for sync > > replication acknowledgement, as it does today, however, upon restart, > > don't open up for business (don't accept ready-only connections) > > unless the sync standbys have caught up. > > > > > > Are you planning to block connections or queries to the database? It would be > > good to allow connections and let them query the monitoring views but block the > > queries until sync standby have caught up. Otherwise, this leaves a monitoring > > hole. In cloud, I presume superusers are allowed to connect and monitor (end > > customers are not the role members and can't query the data). The same can't be > > true for all the installations. Could you please add more details on your > > approach? > > I think ALTER SYSTEM should be allowed, particularly so you can modify > synchronous_standby_names, no? We don't allow SQL access during crash recovery until it's caught up to consistency point. And that's for a reason - the cluster may have invalid system catalog. So no, after crash without a quorum of standbys you can only change auto.conf and send SIGHUP. Accessing the system catalog during crash recovery is another unrelated problem. But I'd propose to treat these two points differently, they possess drastically different scales of danger. Query Cancels are issued here and there during failovers\switchovers. Crash amidst network partitioning is not that common. Best regards, Andrey Borodin.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
SATYANARAYANA NARLAPURAM
Date:
On Tue, Nov 29, 2022 at 10:52 AM Andrey Borodin <amborodin86@gmail.com> wrote:
On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> > 2. Process proc die immediately when a backend is waiting for sync
> > replication acknowledgement, as it does today, however, upon restart,
> > don't open up for business (don't accept ready-only connections)
> > unless the sync standbys have caught up.
> >
> >
> > Are you planning to block connections or queries to the database? It would be
> > good to allow connections and let them query the monitoring views but block the
> > queries until sync standby have caught up. Otherwise, this leaves a monitoring
> > hole. In cloud, I presume superusers are allowed to connect and monitor (end
> > customers are not the role members and can't query the data). The same can't be
> > true for all the installations. Could you please add more details on your
> > approach?
>
> I think ALTER SYSTEM should be allowed, particularly so you can modify
> synchronous_standby_names, no?
We don't allow SQL access during crash recovery until it's caught up
to consistency point. And that's for a reason - the cluster may have
invalid system catalog.
So no, after crash without a quorum of standbys you can only change
auto.conf and send SIGHUP. Accessing the system catalog during crash
recovery is another unrelated problem.
In the crash recovery case, catalog is inconsistent but in this case, the cluster has remote uncommitted changes (consistent). Accepting a superuser connection is no harm. The auth checks performed are still valid after standbys fully caught up. I don't see a reason why superuser / pg_monitor connections are required to be blocked.
But I'd propose to treat these two points differently, they possess
drastically different scales of danger. Query Cancels are issued here
and there during failovers\switchovers. Crash amidst network
partitioning is not that common.
Supportability and operability are more important in corner cases to quickly troubleshoot an issue,
Best regards, Andrey Borodin.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
SATYANARAYANA NARLAPURAM
Date:
On Tue, Nov 29, 2022 at 11:20 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:
On Tue, Nov 29, 2022 at 10:52 AM Andrey Borodin <amborodin86@gmail.com> wrote:On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> > 2. Process proc die immediately when a backend is waiting for sync
> > replication acknowledgement, as it does today, however, upon restart,
> > don't open up for business (don't accept ready-only connections)
> > unless the sync standbys have caught up.
> >
> >
> > Are you planning to block connections or queries to the database? It would be
> > good to allow connections and let them query the monitoring views but block the
> > queries until sync standby have caught up. Otherwise, this leaves a monitoring
> > hole. In cloud, I presume superusers are allowed to connect and monitor (end
> > customers are not the role members and can't query the data). The same can't be
> > true for all the installations. Could you please add more details on your
> > approach?
>
> I think ALTER SYSTEM should be allowed, particularly so you can modify
> synchronous_standby_names, no?
We don't allow SQL access during crash recovery until it's caught up
to consistency point. And that's for a reason - the cluster may have
invalid system catalog.
So no, after crash without a quorum of standbys you can only change
auto.conf and send SIGHUP. Accessing the system catalog during crash
recovery is another unrelated problem.In the crash recovery case, catalog is inconsistent but in this case, the cluster has remote uncommitted changes (consistent). Accepting a superuser connection is no harm. The auth checks performed are still valid after standbys fully caught up. I don't see a reason why superuser / pg_monitor connections are required to be blocked.
If blocking queries is harder, and superuser is not allowed to connect as it can read remote uncommitted data, how about adding a new role that can update and reload the server configuration?
But I'd propose to treat these two points differently, they possess
drastically different scales of danger. Query Cancels are issued here
and there during failovers\switchovers. Crash amidst network
partitioning is not that common.Supportability and operability are more important in corner cases to quickly troubleshoot an issue,
Best regards, Andrey Borodin.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Bharath Rupireddy
Date:
On Tue, Nov 29, 2022 at 10:45 PM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote: > > On Tue, Nov 29, 2022 at 8:42 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote: >> >> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <bruce@momjian.us> wrote: >>> >>> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote: >>> > 2. Process proc die immediately when a backend is waiting for sync >>> > replication acknowledgement, as it does today, however, upon restart, >>> > don't open up for business (don't accept ready-only connections) >>> > unless the sync standbys have caught up. >>> > >>> > Are you planning to block connections or queries to the database? It would be >>> > good to allow connections and let them query the monitoring views but block the >>> > queries until sync standby have caught up. Otherwise, this leaves a monitoring >>> > hole. In cloud, I presume superusers are allowed to connect and monitor (end >>> > customers are not the role members and can't query the data). The same can't be >>> > true for all the installations. Could you please add more details on your >>> > approach? >>> >>> I think ALTER SYSTEM should be allowed, particularly so you can modify >>> synchronous_standby_names, no? >> >> Yes, Change in synchronous_standby_names is expected in this situation. IMHO, blocking all the connections is not a recommendedapproach. > > How about allowing superusers (they can still read locally committed data) and users part of pg_monitor role? I started to spend time on this feature again. Thanks all for your comments so far. Per latest comments, it looks like we're mostly okay to emit a warning and ignore query cancel interrupts while waiting for sync replication ACK. For proc die, it looks like the suggestion was to process it immediately and upon next restart, don't allow user connections unless all sync standbys were caught up. However, we need to be able to allow replication connections from standbys so that they'll be able to stream the needed WAL and catch up with primary, allow superuser or users with pg_monitor role to connect to perform ALTER SYSTEM to remove the unresponsive sync standbys if any from the list or disable sync replication altogether or monitor for flush lsn/catch up status. And block all other connections. Note that replication, superuser and users with pg_monitor role connections are allowed only after the server reaches a consistent state not before that to not read any inconsistent data. The trickiest part of doing the above is how we detect upon restart that the server received proc die while waiting for sync replication ACK. One idea might be to set a flag in the control file before the crash. Second idea might be to write a marker file (although I don't favor this idea); presence indicates that the server was waiting for sync replication ACK before the crash. However, we may not detect all sorts of crashes in a backend when it is waiting for sync replication ACK to do any of these two ideas. Therefore, this may not be a complete solution. Third idea might be to just let the primary wait for sync standbys to catch up upon restart irrespective of whether it was crashed or not while waiting for sync replication ACK. While this idea works well without having to detect all sorts of crashes, the primary may not come up if any unresponsive standbys are present (currently, the primary continues to be operational for read-only queries at least irrespective of whether sync standbys have caught up or not). Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
From
Daniel Gustafsson
Date:
> On 30 Jan 2023, at 06:55, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > I started to spend time on this feature again. Thanks all for your > comments so far. Since there hasn't been any updates for the past six months, and the patch hasn't applied for a few months, I am marking this returned with feedback for now. Please feel free to open a new entry in a future CF for this patch when there is a new version. -- Daniel Gustafsson