Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
Date
Msg-id CA+Tgmob93_YftfhrSF+OpNWXWtv2bq774yBH3JxPiJf=-DxoiA@mail.gmail.com
Whole thread Raw
In response to Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Apr 14, 2022 at 1:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I am wondering whether ffaa44cb559db332baeee7d25dedd74a61974203 bet on
> > the wrong horse. When I invented CatalogSnapshot, I intended for it to
> > be an ephemeral snapshot that we'd keep for as long as we could safely
> > do so and then discard it as soon as there's any hint of a problem.
> > But that commit really takes the opposite approach, trying to keep
> > CatalogSnapshot valid for a longer period of time instead of just
> > chucking it.
>
> Not really following?  ISTM the point of ffaa44cb5 was that if we don't
> include the CatalogSnapshot in our advertised xmin, then the length
> of time we can safely use it is *zero*.

No, I don't think so. I'm proposing that you shouldn't be taking a
catalog snapshot unless you already hold some other snapshot, and that
the catalog snapshot should always be newer than whatever that other
snapshot is. If we made that be true, then your catalog snapshot
couldn't ever be the thing holding back xmin -- and then I think it
doesn't need to be registered.

> > But there is no such code: GetOldestSnapshot() has only one caller.
> > And I don't think we'd add more just to do something like that,
> > because we have other mechanisms that are specifically designed for
> > testing whether tuples are prunable that are better suited to such
> > tasks. It should really only be used when you don't know which of the
> > backend's current snapshots ought to be used for something, but are
> > sure that using the oldest one will be good enough. And in that kind
> > of situation, it's hard to see why using the catalog snapshot would
> > ever be the right idea.
>
> What if the reason we need a snapshot is to detoast some toasted item
> we read from a catalog with the CatalogSnapshot?  There might not be
> any other valid snapshot, so I don't think I buy your argument here.
>
> ... Unless your argument is that the session should always have an
> older non-catalog snapshot, which I'm not sure whether I buy or not.
> But if we do believe that, then adding mechanisms to force it to be so
> could be an alternative solution.  But why is that better than what
> we're doing?

That is exactly my argument, but I'm not actually sure whether it is
in fact better. I was responding to Andres's statement that
CatalogSnapshot was hiding a lot of bugs because it makes it look like
we have a snapshot when we don't really. And my point is that
ffaa44cb559db332baeee7d25dedd74a61974203 made such problems a lot more
likely, because before that the snapshot was not in
RegisteredSnapshots, and therefore anybody asking "hey, do we have a
snapshot?" wasn't likely to get confused, because they would only see
the catalog snapshot if they specifically went and looked at
CatalogSnapshot(Set), and if you do that, hopefully you'll think about
the special semantics of that snapshot and write code that works with
them. But with CatalogSnapshot in RegisteredSnapshots, any code that
looks through RegisteredSnapshots has to worry about whether what it's
finding there is actually just the CatalogSnapshot, and if Andres's
statement that we have a lot of bugs here is to be believed, then we
have not done a good job finding and updating all such code. We can
continue down the path of finding and fixing it -- or we can back out
parts of ffaa44cb559db332baeee7d25dedd74a61974203.

Just to be clear, I'm not debating that that commit fixed some real
problems and I think parts of it are really necessary fixes. But to
quote from the commit message:

    The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting
    for whether MyPgXact->xmin could be cleared or advanced.  In normal
    transactions this was masked by the fact that the transaction snapshot
    would be older, but during backend startup and certain utility commands
    it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had
    been cleared ...

And what I'm suggesting is that *perhaps* we ought to have fixed those
"during backend startup and certain utility commands" by having
SnapshotResetXmin() do InvalidateCatalogSnapshot(), and maybe also
made the code in those commands that's doing catalog lookups hold
acquire and hold a snapshot of its own around the operation. The
alternative you chose, namely, incorporating the xmin into the
backend's xmin computation, I think can also be made to work. I am
just not sure that it's the best approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
Next
From: Tom Lane
Date:
Subject: Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)