Re: Race condition in TransactionIdIsInProgress - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Race condition in TransactionIdIsInProgress
Date
Msg-id CANbhV-GBV2GNb7qzpbuKj-na0sJ+9gE0Qgkc0Y5zfN2degteFQ@mail.gmail.com
Whole thread Raw
In response to Re: Race condition in TransactionIdIsInProgress  (Andres Freund <andres@anarazel.de>)
Responses Re: Race condition in TransactionIdIsInProgress
List pgsql-hackers
On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres@anarazel.de> wrote:

> On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:
> > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > >
> > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > > increase the window between the TransactionIdCommitTree() and
> > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> > > > least it might make it easier to write tests exercising this scenario...
> > >
> > > Agreed
> > >
> > > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > > item cache is set too early in some cases. The single item cache is
> > > important for performance, so we just need to be more careful about
> > > setting the cache.
> >
> > Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> > being set, but this fails! Not worked out why, yet.
>
> I don't think it's safe to check !TransactionIdKnownNotInProgress() in
> TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
> do TransactionLogFetch() internally.

That's not correct because you're confusing
TransactionIdKnownNotInProgress(), which is a new function introduced
by the patch, with the existing function
TransactionIdIsKnownCompleted().


> > just_remove_TransactionIdIsKnownCompleted_call.v1.patch
>
> I think this might contain a different diff than what the name suggests?

Not at all, please don't be confused by the name. The patch removes
the call to TransactionIdIsKnownCompleted() from
TransactionIdIsInProgress().

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.


> > just removes the known offending call, passes make check, but IMHO
> > leaves the same error just as likely by other callers.
>
> Which other callers are you referring to?

I don't know of any, but we can't just remove a function in a
backbranch, can we?


> To me it seems that the "real" reason behind this specific issue being
> nontrivial to fix and behind the frequent error of calling
> TransactionIdDidCommit() before TransactionIdIsInProgress() is
> that we've really made a mess out of the transaction status determination code
> / API. IMO the original sin is requiring the complicated
>
> if (TransactionIdIsCurrentTransactionId(xid)
> ...
> else if (TransactionIdIsInProgress(xid)
> ...
> else if (TransactionIdDidCommit(xid)
> ...
>
> dance at pretty much any place checking transaction status.

Agreed, it's pretty weird to have to call the functions in the right
order or you get the wrong answer. Especially since we have no
cross-check to ensure the correct sequence was followed.


> The multiple calls
> for the same xid is, I think, what pushed the caching down to the
> TransactionLogFetch level.
>
> ISTM that we should introduce something like TransactionIdStatus(xid) that
> returns
> XACT_STATUS_CURRENT
> XACT_STATUS_IN_PROGRESS
> XACT_STATUS_COMMITTED
> XACT_STATUS_ABORTED
> accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
> XidInMVCCSnapshot() instead of TransactionIdIsInProgress().
>
> I think that'd also make visibility checks faster - we spend a good chunk of
> cycles doing unnecessary function calls and repeatedly gathering information.
>
>
> It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
> optimized - TransactionIdIsInProgress() knows it doesn't need to check
> anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
> doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
> smaller than GetTopTransactionIdIfAny() before bsearching through subxids.
>
> Of course anything along these lines would never be backpatchable...

Agreed

I've presented a simple patch intended for backpatching. You may not
like the name, but please have another look at
"just_remove_TransactionIdIsKnownCompleted_call.v1.patch".
I believe it is backpatchable with minimal impact and without loss of
performance.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pgsql: Add suport for server-side LZ4 base backup compression.
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Add TAP test to automate the equivalent of check_guc