Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] logical decoding of two-phase transactions
Date
Msg-id CAA4eK1L3p4z+9wtK77MbdpkagR4GS2Y3r1Je7ZEvLQVF9GArfg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] logical decoding of two-phase transactions  (Ajin Cherian <itsajin@gmail.com>)
Responses Re: [HACKERS] logical decoding of two-phase transactions
List pgsql-hackers
On Fri, Dec 18, 2020 at 11:23 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Thu, Dec 17, 2020 at 11:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > I went ahead and used both origin_lsn and origin_timestamp to avoid
> > the possibility of a match of prepared xact from two different nodes.
> > We can handle this at begin_prepare and prepare time but we don't have
> > prepare_lsn and prepare_timestamp at rollback_prepared time, so what
> > do about that? As of now, I am using just GID at rollback_prepare time
> > and that would have been sufficient if we always receive prepare
> > before rollback because at prepare time we would have checked
> > origin_lsn and origin_timestamp. But it is possible that we get
> > rollback prepared without prepare in case if prepare happened before
> > consistent_snapshot is reached and rollback happens after that. For
> > commit-case, we do send prepare and all the data at commit time in
> > such a case but doing so for rollback case doesn't sound to be a good
> > idea. Another possibility is that we send prepare_lsn and prepare_time
> > in rollback_prepared API to deal with this. I am not sure if it is a
> > good idea to just rely on GID in rollback_prepare. What do you think?
>
> Thinking about it for some time, my initial reaction was that the
> distributed servers should maintain uniqueness of GIDs and re-checking
> with LSNs is just overkill. But thinking some more, I realise that
> since we allow reuse of GIDs, there could be a race condition where a
> previously aborted/committed txn's GID was reused
> which could lead to this. Yes, I think we could change
> rollback_prepare to send out prepare_lsn and prepare_time as well,
> just to be safe.
>

Okay, I have changed the rollback_prepare API as discussed above and
accordingly handle the case where rollback is received without prepare
in apply_handle_rollback_prepared.

While testing for this case, I noticed that the tracking of
replication progress for aborts is not complete due to which after
restart we can again ask for the rollback lsn. This shouldn't be a
problem with the latest code because we will simply skip it when there
is no corresponding prepare but this is far from ideal because that is
the sole purpose of tracking via replication origins. This was due to
the incomplete handling of aborts in the original commit 1eb6d6527a. I
have fixed this now in a separate patch
v33-0004-Track-replication-origin-progress-for-rollbacks. If you want
to see the problem then change the below code and don't apply
v33-0004-Track-replication-origin-progress-for-rollbacks, the
regression failure is due to the reason that we are not tracking
progress for aborts:

apply_handle_rollback_prepared
{
..
if (LookupGXact(rollback_data.gid, rollback_data.prepare_end_lsn,
rollback_data.preparetime))
..
}

to
apply_handle_rollback_prepared
{
..
Assert (LookupGXact(rollback_data.gid, rollback_data.prepare_end_lsn,
rollback_data.preparetime));

-- 
With Regards,
Amit Kapila.

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Proposed patch for key managment
Next
From: Masahiko Sawada
Date:
Subject: Commit fest manager for 2021-01