Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403) - Mailing list pgsql-hackers

Hi,

On 2022-04-14 12:16:45 -0400, Robert Haas wrote:
> I got curious and looked at the underlying problem here and I am
> wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
> seems to me that the code is always going to return true if there are
> any active snapshots, and the rest of the function is intended to test
> whether there is a registered snapshot other than the catalog
> snapshot. But I don't think that's what this code does:
> 
>      if (pairingheap_is_empty(&RegisteredSnapshots) ||
>          !pairingheap_is_singular(&RegisteredSnapshots))
>          return false;
> 
>      return CatalogSnapshot == NULL;

Certainly looks off...


> I find that 'make check-world' passes with this change, which is
> disturbing, because it also passes without this change. That means we
> don't have any tests that reach HaveRegisteredOrActiveSnapshot() with
> more than one registered snapshot.

Part of that is because right now the assertion is placed "too deep" -
it should be much higher up, so it's reached even if there's not
actually a toast datum. But there's of other bugs preventing that :(. A
lot of bugs have been hidden by the existence of CatalogSnapshot (which
of course isn't something one actually can rely on).


> Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
> more places,

I think we should, but there's the other bugs that need to be fixed
first :(. Namely that we have plenty places doing catalog accesses
without an active or registered snapshot :(.


> I think we should consider revising the whole approach here. The way
> init_toast_snapshot() is coded, we basically have some code that tests
> for active and registered snapshots and finds the oldest one. We error
> out if there isn't one. And then we get to this assertion, which
> checks the same stuff a second time but with an additional check to
> see whether we ended up with the catalog snapshot.  Wouldn't it make
> more sense if GetOldestSnapshot() just refused to return the catalog
> snapshot in the first place?

I'm worried that that could cause additional bugs. Consider code using
GetOldestSnapshot() to check if tuples need to be preserved or such.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: fix cost subqueryscan wrong parallel cost
Next
From: Andres Freund
Date:
Subject: Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)