Re: Fix slot synchronization with two_phase decoding enabled - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Fix slot synchronization with two_phase decoding enabled
Date
Msg-id CAA4eK1+Row5XWDbOCTgd4_s=eaqXAL7iXDFQkAinuJFqOTt46A@mail.gmail.com
Whole thread Raw
In response to Fix slot synchronization with two_phase decoding enabled  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: Fix slot synchronization with two_phase decoding enabled
List pgsql-hackers
On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> When testing the slot synchronization with logical replication slots that
> enabled two_phase decoding, I found that transactions prepared before two-phase
> decoding is enabled may fail to replicate to the subscriber after being
> committed on a promoted standby following a failover.
>
> To reproduce this issue, please follow these steps (also detailed in the
> attached TAP test, v1-0001):
>
> 1. sub: create a subscription with (two_phase = false)
> 2. primary (pub): prepare a txn A.
> 3. sub: alter subscription set (two_phase = true) and wait for the logical slot to
>    be synced to standby.
> 4. primary (pub): stop primary, promote the standby and let the subscriber use
>    the promoted standby as publisher.
> 5. promoted standby (pub): COMMIT PREPARED A;
> 6. sub: the apply worker will report the following ERROR because it didn't
>    receive the PREPARE.
>    ERROR:  prepared transaction with identifier "pg_gid_16387_752" does not exist
>
> I think the root cause of this issue is that the two_phase_at field of the
> slot, which indicates the LSN from which two-phase decoding is enabled (used to
> prevent duplicate data transmission for prepared transactions), is not
> synchronized to the standby server.
>
> In step 3, transaction A is not immediately replicated because it occurred
> before enabling two-phase decoding. Thus, the prepared transaction should only
> be replicated after decoding the final COMMIT PREPARED, as referenced in
> ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at on the
> standby, the prepared transaction fails to send at that time.
>
> This problem arises after the support for altering the two-phase option
> (1462aad).
>

Thanks for the report and patch. I'll look into it.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Next
From: David Rowley
Date:
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET