Re: repeated decoding of prepared transactions - Mailing list pgsql-hackers

From Markus Wanner
Subject Re: repeated decoding of prepared transactions
Date
Msg-id 415799ff-89bb-a78e-2f79-7f29834d0460@enterprisedb.com
Whole thread Raw
In response to Re: repeated decoding of prepared transactions  (Ajin Cherian <itsajin@gmail.com>)
List pgsql-hackers
Ajin, Amit,

thank you both a lot for thinking this through and even providing a patch.

The changes in expectation for twophase.out matches exactly with what I 
prepared.  And the switch with pg_logical_slot_get_changes indeed is 
something I had not yet considered, either.

On 19.02.21 03:50, Ajin Cherian wrote:
> For this, I am planning to change the semantics such that
> two-phase-commit can only be specified while creating the slot using
> pg_create_logical_replication_slot()
> and not in pg_logical_slot_get_changes, thus preventing
> two-phase-commit flag from being toggled between restarts of the
> decoder. Let me know if anybody objects to this
> change, else I will update that in the next patch.

This sounds like a good plan to me, yes.


However, more generally speaking, I suspect you are overthinking this. 
All of the complexity arises because of the assumption that an output 
plugin receiving and confirming a PREPARE may not be able to persist 
that first phase of transaction application.  Instead, you are trying to 
somehow resurrect the transactional changes and the prepare at COMMIT 
PREPARED time and decode it in a deferred way.

Instead, I'm arguing that a PREPARE is an atomic operation just like a 
transaction's COMMIT.  The decoder should always feed these in the order 
of appearance in the WAL.  For example, if you have PREAPRE A, COMMIT B, 
COMMIT PREPARED A in the WAL, the decoder should always output these 
events in exactly that order.  And not ever COMMIT B, PREPARE A, COMMIT 
PREPARED A (which is currently violated in the expectation for 
twophase_snapshot, because the COMMIT for `s1insert` there appears after 
the PREPARE of `s2p` in the WAL, but gets decoded before it).

The patch I'm attaching corrects this expectation in twophase_snapshot, 
adds an explanatory diagram, and eliminates any danger of sending 
PREPAREs at COMMIT PREPARED time.  Thereby preserving the ordering of 
PREPAREs vs COMMITs.

Given the output plugin supports two-phase commit, I argue there must be 
a good reason for it setting the start_decoding_at LSN to a point in 
time after a PREPARE.  To me that means the output plugin (or its 
downstream replica) has processed the PREPARE (and the downstream 
replica did whatever it needed to do on its side in order to make the 
transaction ready to be committed in a second phase).

(In the weird case of an output plugin that wants to enable two-phase 
commit but does not really support it downstream, it's still possible 
for it to hold back LSN confirmations for prepared-but-still-in-flight 
transactions.  However, I'm having a hard time justifying this use case.)

With that line of thinking, the point in time (or in WAL) of the COMMIT 
PREPARED does not matter at all to reason about the decoding of the 
PREPARE operation.  Instead, there are only exactly two cases to consider:

a) the PREPARE happened before the start_decoding_at LSN and must not be 
decoded. (But the effects of the PREPARE must then be included in the 
initial synchronization. If that's not supported, the output plugin 
should not enable two-phase commit.)

b) the PREPARE happens after the start_decoding_at LSN and must be 
decoded.  (It obviously is not included in the initial synchronization 
or decoded by a previous instance of the decoder process.)

The case where the PREPARE lies before SNAPBUILD_CONSISTENT must always 
be case a) where we must not repeat the PREPARE, anyway.  And in case b) 
where we need a consistent snapshot to decode the PREPARE, existing 
provisions already guarantee that to be possible (or how would this be 
different from a regular single-phase commit?).

Please let me know what you think and whether this approach is feasible 
for you as well.

Regards

Markus

Attachment

pgsql-hackers by date:

Previous
From: "Jonah H. Harris"
Date:
Subject: Re: Extensibility of the PostgreSQL wire protocol
Next
From: "Seamus Abshere"
Date:
Subject: Re: A reloption for partitioned tables - parallel_workers