Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Date
Msg-id 763B5AF0-1C9E-4796-9639-F969A2E66189@yandex-team.ru
Whole thread Raw
In response to Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
List pgsql-hackers

> On 9 May 2022, at 14:20, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Apr 26, 2022 at 11:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>>
>> While this may mitigate the problem, I don't think it will deal with
>> all the cases which could cause a transaction to end up committed locally,
>> but not on the synchronous standby.  I think that only using the full
>> power of two-phase commit can make this bulletproof.
>
> Not sure if it's recommended to use 2PC in postgres HA with sync
> replication where the documentation says that "PREPARE TRANSACTION"
> and other 2PC commands are "intended for use by external transaction
> management systems" and with explicit transactions. Whereas, the txns
> within a postgres HA with sync replication always don't have to be
> explicit txns. Am I missing something here?

COMMIT PREPARED needs to be replicated as well, thus encountering the very same problem as usual COMMIT: if done during
failoverit can be canceled and committed data can be wrongfully reported durably written. 2PC is not a remedy to the
factthat PG silently cancels awaiting of sync replication. The problem arise in presence of any "commit". And "commit"
isthere if transactions are there. 

> On 9 May 2022, at 14:44, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> IMHO, making it wait for some amount of time, based on GUC is not a
> complete solution.  It is just a hack to avoid the problem in some
> cases.

Disallowing cancelation of locally committed transactions is not a hack. It's removing of a hack that was erroneously
installedto make backend responsible to Ctrl+C (or client side statement timeout). 

> On 26 Apr 2022, at 11:26, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> Is it worth adding additional complexity that is not a complete solution?

Its not additional complexity. It is removing additional complexity that made sync rep interruptible. (But I'm surely
talkingnot about GUCs like synchronous_replication_naptime_before_cancel - wait of sync rep must be indefinite until
synchrous_commit\synchronous_standby_namesare satisfied ) 

And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presence
ofa server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timeline
endis successfully replicated to synchronous_standby_names. 

Best regards, Andrey Borodin.


pgsql-hackers by date:

Previous
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: Finer grain log timestamps
Next
From: Niyas Sait
Date:
Subject: Re: [PATCH] Add native windows on arm64 support