Re: AtEOXact_Snapshot timing - Mailing list pgsql-hackers

From Robert Haas
Subject Re: AtEOXact_Snapshot timing
Date
Msg-id CA+TgmoacPcQM_NdMS9C2q_zQ9JG9TDnS3xjioqEKgVHLePqk=w@mail.gmail.com
Whole thread Raw
In response to Re: AtEOXact_Snapshot timing  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Nov 11, 2019 at 2:12 PM Andres Freund <andres@anarazel.de> wrote:
> Hm. I feel like there's plenty reasons to get a snapshot in extensions -
> there's plenty APIs one cannot really call without doing so?

Sure, I don't disagree with that.

> What I'm
> worried about is not primarily that GetSnapshotData() is being called
> directly, but that $author got a snapshot previously, and then tries to
> use it in an xact callback or such.

Yeah, I guess that's possible. Registering a new snapshot would trip
one of the assertions I added, but using an old one wouldn't.

> I'd add asserts to at least PushActiveSnapshot(), and I don't see the
> harm in adding one to GetSnapshotData().

OK. I think there's some risk that someone will have a purpose for
calling GetSansphotData() which is actually tolerably safe but now
won't work, so I thin there's a chance we'll get a complaint. But if
we do, we can always reconsider whether to take that particular
assertion back out again.

> Based on what I've seen people do in xact callback handlers... Didn't PG
> itself e.g. have code doing syscache lookups in aborted transactions a
> couple times?

Sure -- but the assertions I had already added would catch that
anyway, assuming that it actually attempted catalog access, and we now
also have assertions that will catch it even if it would have been
satisfied from the cache.

> The danger of using a previously acquired snapshot in that stage is why
> I was pondering adding an assert to visibility functions
> themselves. E.g. just adding one to HeapTupleSatisfiesVisibility() might
> already add a good bit of coverage.

Yeah, but I really hate to do that; those functions are super-hot. And
I don't think we need to go overboard in protecting people from
themselves. The assertions I'm proposing to add should already catch
quite a bit of stuff that is unchecked today, and with little or no
possible downside. There's no rule that we can't add more later, nor
are more assertions always better than fewer.

> > I agree with you that it would be nicer if we could put the killing of
> > the old snapshots directly adjacent to ProcArrayEndTransaction(), and
> > that was the first thing I tried, but it doesn't work, because
> > resource owner cleanup has to run first.
>
> Hm. I'd even say that it actually belongs to *before* the
> ProcArrayEndTransaction() call.
>
> For a bit I wondered if the resowner snapshot cleanup ought to be at
> least moved to RESOURCE_RELEASE_BEFORE_LOCKS. Not that it actually
> addresses this issue, but it seems to belong there "thematically". But
> then I honestly don't understand why most of the resowner managed
> resources in the abort sequence are released where they are.  The only
> really explanatory comment is:
>
>          * The ordering of operations is not entirely random.  The idea is:
>          * release resources visible to other backends (eg, files, buffer pins);
>          * then release locks; then release backend-local resources. We want to
>          * release locks at the point where any backend waiting for us will see
>          * our transaction as being fully cleaned up.
>
> but that doesn't explain why we e.g. process relcache references, jit
> contexts (arguably it does for dsm segments), at that stage. And
> definitely not why we do abort's relcache inval processing between
> RESOURCE_RELEASE_BEFORE_LOCKS and RESOURCE_RELEASE_LOCKS - that can be
> quite expensive whe needing to scan the whole relcache.
>
> Anyway, this is grumbling about things far beyond the scope of this
> patch.

Yeah. I do agree with you that a lot of that stuff isn't very well
explained.  Nor is there much of an explanation of why some things go
through resowner.c and other things have bespoke cleanup code. But, as
you say, that's out of scope.

> I'm not sure that's true - I've certainly seen extensions logging the
> transaction state into a table, for example... Even in aborted xacts.

Whoa.

> > Non-catalog access would probably fail the assertion in
> > GetTransactionSnapshot() or GetLatestSnapshot().
>
> Not this patch's fault, obviously, but none of this appears to catch
> DML...

Really? It's hard to imagine that DML wouldn't attempt catalog access.

> Just to be clear: While I think an assert or two more seem like a good
> idea, that's musing around the edges, not a fundamental concern.

All right, here's another version with an assert or two more. :-)

> > diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
> > index f594d33e7a..9993251607 100644
> > --- a/src/backend/access/transam/xact.c
> > +++ b/src/backend/access/transam/xact.c
> > @@ -2732,6 +2732,18 @@ AbortTransaction(void)
> >               pgstat_report_xact_timestamp(0);
> >       }
> >
> > +     /*
> > +      * Any snapshots taken by this transaction were unsafe to use even at the
> > +      * time when we entered AbortTransaction(), since we might have, for
> > +      * example, inserted a heap tuple and failed while inserting index tuples.
> > +      * They were even more unsafe after ProcArrayEndTransaction(), since after
> > +      * that point tuples we inserted could be pruned by other backends.
> > +      * However, we postpone the cleanup until this point in the sequence
> > +      * because it has to be done after ResourceOwnerRelease() has finishing
> > +      * unregistering snapshots.
> > +      */
> > +     AtEOXact_Snapshot(false, true);
>
> One thing that bothers me a bit here is that the other cleanup calls are
> within
>
>         /*
>          * Post-abort cleanup.  See notes in CommitTransaction() concerning
>          * ordering.  We can skip all of it if the transaction failed before
>          * creating a resource owner.
>          */
>         if (TopTransactionResourceOwner != NULL)
>
> and adding AtEOXact_Snapshot() below that block kind of makes that
> comment wrong. I don't think we actually can make the call conditional
> in the same way, however.

I guess I read that statement as referring to the contents of the
if-block, not everything below that in the function. But we could
reword the comment to, e.g. We can skip quite a bit of work if the
transaction failed before creating a resource owner. Done in the
attached version.

>
>
> >       /*
> >        * State remains TRANS_ABORT until CleanupTransaction().
> >        */
> > @@ -2757,7 +2769,6 @@ CleanupTransaction(void)
> >        * do abort cleanup processing
> >        */
> >       AtCleanup_Portals();            /* now safe to release portal memory */
> > -     AtEOXact_Snapshot(false, true); /* and release the transaction's snapshots */
>
> Hm. I don't quite get why we tell AtEOXact_Snapshot() to clean up
> ->xmin, when we just called ProcArrayEndTransaction(). Again, not this
> patch's fault...

So let's leave that for another time.

v3 attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Invisible PROMPT2
Next
From: Robert Haas
Date:
Subject: Re: JIT performance bug/regression & JIT EXPLAIN