Re: Race condition in TransactionIdIsInProgress - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Race condition in TransactionIdIsInProgress
Date
Msg-id 20220624014307.xr7yolx4vd5cjcbi@alap3.anarazel.de
Whole thread Raw
In response to Re: Race condition in TransactionIdIsInProgress  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Race condition in TransactionIdIsInProgress
List pgsql-hackers
Hi,

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:
> On 12/02/2022 05:42, Andres Freund wrote:
> > The only reason I'm so far not succeeding in turning it into an
> > isolationtester spec is that a transaction waiting for SyncRep doesn't count
> > as waiting for isolationtester.
> >
> > Basically
> >
> > S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep>
> > S2: SELECT pg_xact_status($xid);
> > S2: UPDATE;
> >
> > suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
> > cache, which then causes the TransactionIdIsInProgress() to take the early
> > return path, despite the transaction still being in progress. Which then
> > allows the update to proceed, despite the S1 not having "properly committed"
> > yet.
>
> I started to improve isolationtester, to allow the spec file to specify a
> custom query to check for whether the backend is blocked. But then I
> realized that it's not quite enough: there is currently no way write the
> "$xid = txid_current()" step either. Isolationtester doesn't have variables
> like that. Also, you need to set synchronous_standby_names to make it block,
> which seems a bit weird in an isolation test.

I don't think we strictly need $xid - it likely can be implemented via
something like
  SELECT pg_xact_status(backend_xid) FROM pg_stat_activity WHERE application_name = 'testname/s1';


> I wish we had an explicit way to inject delays or failures. We discussed it
> before
(https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com),
> but I don't want to pick up that task right now.

Understandable :(


> Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if
> this is worth committing, the pg_xact_status() trick is quite special for
> this particular bug.

Yea, not sure either.


> Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good
> to me. Replying to the discussion on that:
>
> On 12/02/2022 23:50, Andres Freund wrote:
> > On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:
> > > I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> > > in backbranches.
> >
> > I think it'd be fine if we needed to. Or we could just make it always return
> > false or such.
> >
> >
> > > > > just removes the known offending call, passes make check, but IMHO
> > > > > leaves the same error just as likely by other callers.
> > > >
> > > > Which other callers are you referring to?
> > >
> > > I don't know of any, but we can't just remove a function in a
> > > backbranch, can we?
> >
> > We have in the past, if it's a "sufficiently internal" function.
>
> I think we should remove it in v15, and leave it as it is in backbranches.
> Just add a comment to point out that the name is a bit misleading, because
> it checks the clog rather than the proc array.  It's not inherently
> dangerous, and someone might have a legit use case for it. True, someone
> might also be doing this incorrect thing with it, but still seems like the
> right balance to me.
>
> I think we also need to change pg_xact_status(), to also call
> TransactionIdIsInProgress() before TransactionIdDidCommit/Abort().
> Currently, if it returns "committed", and you start a new transaction, the
> new transaction might not yet see the effects of that "committed"
> transaction. With that, you cannot reproduce the original bug with
> pg_xact_status() though, so there's no point in committing that test then.
>
> In summary, I think we should:
> - commit and backpatch Simon's
> just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> - fix pg_xact_status() to check TransactionIdIsInProgress()
> - in v15, remove TransationIdIsKnownCompleted function altogether
>
> I'll try to get around to that in the next few days, unless someone beats me
> to it.

Makes sense.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: SLRUs in the main buffer pool - Page Header definitions
Next
From: Andres Freund
Date:
Subject: Re: Make COPY extendable in order to support Parquet and other formats