Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
Date
Msg-id CA+TgmoaLb0j9zEfkk6bmuKM5422JenzzcJLJSiPLgwfMd+xnBw@mail.gmail.com
Whole thread Raw
In response to Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
List pgsql-hackers
On Thu, Apr 14, 2022 at 11:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't think Andres had any intention of ignoring it indefinitely.
> What he is doing is prioritizing it lower than several of the other
> open items, an opinion I fully agree with.

Well, my point is that, historically, relegating things to the older
bugs section often means nothing happens, which I think would not be
great in this case. However, I'll try to shut up about the procedural
issues for the moment, since I seem to be in the minority.

I got curious and looked at the underlying problem here and I am
wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
seems to me that the code is always going to return true if there are
any active snapshots, and the rest of the function is intended to test
whether there is a registered snapshot other than the catalog
snapshot. But I don't think that's what this code does:

     if (pairingheap_is_empty(&RegisteredSnapshots) ||
         !pairingheap_is_singular(&RegisteredSnapshots))
         return false;

     return CatalogSnapshot == NULL;

So, if there are 0 active snapshots we return false. And if there is 1
active snapshot then we return true if there's no catalog snapshot and
otherwise false. So far so good. But if there are 2 or more registered
snapshots then the pairing heap will be neither empty nor singular so
it seems to me we will return false, which seems to me to be the wrong
answer. I tried this:

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index a0be0c411a..4e8e26a362 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1646,9 +1646,10 @@ HaveRegisteredOrActiveSnapshot(void)
      * removed at any time due to invalidation processing. If explicitly
      * registered more than one snapshot has to be in RegisteredSnapshots.
      */
-    if (pairingheap_is_empty(&RegisteredSnapshots) ||
-        !pairingheap_is_singular(&RegisteredSnapshots))
+    if (pairingheap_is_empty(&RegisteredSnapshots))
         return false;
+    if (!pairingheap_is_singular(&RegisteredSnapshots))
+        return true;

     return CatalogSnapshot == NULL;
 }

I find that 'make check-world' passes with this change, which is
disturbing, because it also passes without this change. That means we
don't have any tests that reach HaveRegisteredOrActiveSnapshot() with
more than one registered snapshot.

Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
more places, I think we should consider revising the whole approach
here. The way init_toast_snapshot() is coded, we basically have some
code that tests for active and registered snapshots and finds the
oldest one. We error out if there isn't one. And then we get to this
assertion, which checks the same stuff a second time but with an
additional check to see whether we ended up with the catalog snapshot.
Wouldn't it make more sense if GetOldestSnapshot() just refused to
return the catalog snapshot in the first place?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: allow specifying action when standby encounters incompatible parameter settings
Next
From: Andres Freund
Date:
Subject: Re: Intermittent buildfarm failures on wrasse