Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] logical decoding of two-phase transactions
Date
Msg-id CAA4eK1L4fZGY0YQOuHRkZU4ThBmLBpObwdndJ0UuuWLRswhi3g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] logical decoding of two-phase transactions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] logical decoding of two-phase transactions
Re: [HACKERS] logical decoding of two-phase transactions
List pgsql-hackers
On Mon, Nov 30, 2020 at 7:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 30, 2020 at 2:36 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> Sure, but you can see in your example above it got skipped due to
> start_decoding_at not due to DecodingContextReady. So, the problem as
> mentioned by me previously was how we distinguish those cases because
> it can skip due to start_decoding_at during restart as well when we
> would have already sent the prepare to the subscriber.
>
> One idea could be that the subscriber skips the transaction if it sees
> the transaction is already prepared.
>

To skip it, we need to send GID in begin message and then on
subscriber-side, check if the prepared xact already exists, if so then
set a flag. The flag needs to be set in begin/start_stream and reset
in stop_stream/commit/abort. Using the flag, we can skip the entire
contents of the prepared xact. In ReorderFuffer-side also, we need to
get and set GID in txn even when we skip it because we need to send
the same at commit time. In this solution, we won't be able to send it
during normal start_stream because by that time we won't know GID and
I think that won't be required. Note that this is only required when
we skipped sending prepare, otherwise, we just need to send
Commit-Prepared at commit time.

Another way to solve this problem via publisher-side is to maintain in
some file at slot level whether we have sent prepare for a particular
txn? Basically, after sending prepare, we need to update the slot
information on disk to indicate that the particular GID is sent (we
can probably store GID and LSN of Prepare). Then next time whenever we
have to skip prepare due to whatever reason, we can check the
existence of persistent information on disk for that GID, if it exists
then we need to send just Commit Prepared, otherwise, the entire
transaction. We can remove this information during or after
CheckPointSnapBuild, basically, we can remove the information of all
GID's that are after cutoff LSN computed via
ReplicationSlotsComputeLogicalRestartLSN. Now, we can even think of
removing this information after Commit Prepared but not sure if that
is correct because we can't lose this information unless
start_decoding_at (or restart_lsn) is moved past the commit lsn

Now, to persist this information, there could be multiple
possibilities (a) maintain the flexible array for GID's at the end of
ReplicationSlotPersistentData, (b) have a separate state file per-slot
for prepared xacts, (c) have a separate state file for each prepared
xact per-slot.

With (a) during upgrade from the previous version there could be a
problem because the previous data won't match new data but I am not
sure if we maintain slots info intact after upgrade. I think (c) would
be simplest but OTOH, having many such files (in case there are more
prepared xacts) per-slot might not be a good idea.

One more thing that needs to be thought about is when we are sending
the entire xact at commit time whether we will send prepare
separately? Because, if we don't send it separately, then later
allowing the PREPARE on the master to wait for prepare via subscribers
won't be possible?

Thoughts?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: autovac issue with large number of tables
Next
From: Fujii Masao
Date:
Subject: Re: autovac issue with large number of tables