Thread: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

Hi,

On master I got a FailedAssertion("HaveRegisteredOrActiveSnapshot()"
on an assert-enabled instance and with (I think) data over a certain length.

I whittled it down to the attached bash (careful - it drops stuff).  It 
has 5 tsv-data lines (one long line) that COPY slurps into a table.  The 
middle, third line causes the problem, later on.  Shortening the long 
line to somewhere below 2000 characters fixes it again.

More info in the attached .sh file.

If debug-assert is 'off', the problem does not occur. (REL_14_STABLE 
also does not have the problem, assertions or not)

thanks,

Erik Rijkers


Attachment
At Sun, 27 Mar 2022 20:32:45 +0200, Erik Rijkers <er@xs4all.nl> wrote in 
> On master I got a FailedAssertion("HaveRegisteredOrActiveSnapshot()"
> on an assert-enabled instance and with (I think) data over a certain
> length.
> 
> I whittled it down to the attached bash (careful - it drops stuff).
> It has 5 tsv-data lines (one long line) that COPY slurps into a table.
> The middle, third line causes the problem, later on.  Shortening the
> long line to somewhere below 2000 characters fixes it again.
> 
> More info in the attached .sh file.

It is reproducible for me. Thanks for the reproducer.

> If debug-assert is 'off', the problem does not occur. (REL_14_STABLE
> also does not have the problem, assertions or not)

It seems like related with [1]?

Inserting EnsurePortalSnapshotExists() to RunFromStore fixes this but
I'm not sure where is the right place to do this..

[1] https://www.postgresql.org/message-id/flat/20210623035916.GL29179%40telsasoft.com#f802617a00cee4d013ad8fa69e1af048

For someone's information, this is more readable stack trace.

#0  0x00007f43aeed037f in raise () from /lib64/libc.so.6
#1  0x00007f43aeebadb5 in abort () from /lib64/libc.so.6
#2  0x0000000000b28747 in ExceptionalCondition (
    conditionName=0xba2c48 "HaveRegisteredOrActiveSnapshot()", 
    errorType=0xba2882 "FailedAssertion", 
    fileName=0xba2870 "toast_internals.c", lineNumber=670) at assert.c:69
#3  0x00000000004ac776 in init_toast_snapshot (toast_snapshot=0x7ffce64f7440)
    at toast_internals.c:670
#4  0x00000000005164ea in heap_fetch_toast_slice (toastrel=0x7f43b193cad0, 
    valueid=16393, attrsize=1848, sliceoffset=0, slicelength=1848, 
    result=0x1cbb948) at heaptoast.c:688
#5  0x000000000049fc86 in table_relation_fetch_toast_slice (
    toastrel=0x7f43b193cad0, valueid=16393, attrsize=1848, sliceoffset=0, 
    slicelength=1848, result=0x1cbb948)
    at ../../../../src/include/access/tableam.h:1892
#6  0x00000000004a0a0f in toast_fetch_datum (attr=0x1d6b171) at detoast.c:375
#7  0x000000000049fffb in detoast_attr (attr=0x1d6b171) at detoast.c:123
#8  0x0000000000b345ba in pg_detoast_datum_packed (datum=0x1d6b171)
    at fmgr.c:1757
#9  0x0000000000aece72 in text_to_cstring (t=0x1d6b171) at varlena.c:225
#10 0x0000000000aedda2 in textout (fcinfo=0x7ffce64f77a0) at varlena.c:574
#11 0x0000000000b331bf in FunctionCall1Coll (flinfo=0x1d695e0, collation=0, 
    arg1=30847345) at fmgr.c:1138
#12 0x0000000000b3422b in OutputFunctionCall (flinfo=0x1d695e0, val=30847345)
    at fmgr.c:1575
#13 0x00000000004a6b6c in printtup (slot=0x1cb81f0, self=0x1c96e90)
    at printtup.c:357
#14 0x000000000099499f in RunFromStore (portal=0x1cf9380, 
    direction=ForwardScanDirection, count=0, dest=0x1c96e90) at pquery.c:1096
#15 0x00000000009944e3 in PortalRunSelect (portal=0x1cf9380, forward=true, 
    count=0, dest=0x1c96e90) at pquery.c:917
#16 0x00000000009941d3 in PortalRun (portal=0x1cf9380, 
    count=9223372036854775807, isTopLevel=true, run_once=true, 
    dest=0x1c96e90, altdest=0x1c96e90, qc=0x7ffce64f7ac0) at pquery.c:765
#17 0x000000000098df4b in exec_simple_query (
    query_string=0x1c96030 "fetch all in myportal;") at postgres.c:1250
#18 0x00000000009923a3 in PostgresMain (dbname=0x1cc11b0 "postgres", 
    username=0x1cc1188 "horiguti") at postgres.c:4520
#19 0x00000000008c6caf in BackendRun (port=0x1cb74c0) at postmaster.c:4593
#20 0x00000000008c6631 in BackendStartup (port=0x1cb74c0) at postmaster.c:4321
#21 0x00000000008c29cb in ServerLoop () at postmaster.c:1801
#22 0x00000000008c2298 in PostmasterMain (argc=1, argv=0x1c8e0e0)
    at postmaster.c:1473
#23 0x00000000007c14c3 in main (argc=1, argv=0x1c8e0e0) at main.c:202



regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



At Mon, 28 Mar 2022 18:36:46 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Inserting EnsurePortalSnapshotExists() to RunFromStore fixes this but
> I'm not sure where is the right place to do this..

Then, I found that portal->holdSnapshot is that.  I came up with the
attached.  It does the follows:

1. Teach PlannedStmtRequiresSnapshot() to return true for FetchStmt.

2. Use holdSnapshot in RunFromStore if any.


The rerpducer is reduced to as small as the following.

CREATE TABLE t (a text);
INSERT INTO t VALUES('some random text');
BEGIN;
DECLARE c CURSOR FOR SELECT * FROM t;
FETCH ALL IN c;

But I haven't come up with a reasonable way to generate the 'some
random text' yet.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment
At Tue, 29 Mar 2022 17:06:21 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Mon, 28 Mar 2022 18:36:46 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Then, I found that portal->holdSnapshot is that.  I came up with the
> attached.  It does the follows:
> 
> 1. Teach PlannedStmtRequiresSnapshot() to return true for FetchStmt.
> 
> 2. Use holdSnapshot in RunFromStore if any.
> 
> 
> The rerpducer is reduced to as small as the following.
> 
> CREATE TABLE t (a text);
> INSERT INTO t VALUES('some random text');
> BEGIN;
> DECLARE c CURSOR FOR SELECT * FROM t;
> FETCH ALL IN c;
> 
> But I haven't come up with a reasonable way to generate the 'some
> random text' yet.

I gave up and took a straightforward way to generate one.

I don't like that it uses a fixed length for the random text, but
anyway it works for now...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment
On Tue, 29 Mar 2022 at 11:10, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
> At Tue, 29 Mar 2022 17:06:21 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > At Mon, 28 Mar 2022 18:36:46 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > Then, I found that portal->holdSnapshot is that.  I came up with the
> > attached.  It does the follows:
> >
> > 1. Teach PlannedStmtRequiresSnapshot() to return true for FetchStmt.
> >
> > 2. Use holdSnapshot in RunFromStore if any.
> >
> >
> > The rerpducer is reduced to as small as the following.
> >
> > CREATE TABLE t (a text);
> > INSERT INTO t VALUES('some random text');
> > BEGIN;
> > DECLARE c CURSOR FOR SELECT * FROM t;
> > FETCH ALL IN c;
> >
> > But I haven't come up with a reasonable way to generate the 'some
> > random text' yet.
>
> I gave up and took a straightforward way to generate one.
>
> I don't like that it uses a fixed length for the random text, but
> anyway it works for now...

An shorter (?) reproducer might be the following, which forces any
value for 'a' to be toasted and thus triggering the check in
init_toast_snapshot regardless of value length:

CREATE TABLE t (a text);
ALTER TABLE t ALTER COLUMN a SET STORAGE EXTERNAL;
INSERT INTO t VALUES ('toast');
BEGIN;
DECLARE c CURSOR FOR SELECT * FROM t;
FETCH ALL IN c;

Enjoy,

-Matthias



Op 29-03-2022 om 12:50 schreef Matthias van de Meent:
> On Tue, 29 Mar 2022 at 11:10, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>>
>> At Tue, 29 Mar 2022 17:06:21 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
>>> At Mon, 28 Mar 2022 18:36:46 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
>>> Then, I found that portal->holdSnapshot is that.  I came up with the
>>> attached.  It does the follows:
>>>
>>> 1. Teach PlannedStmtRequiresSnapshot() to return true for FetchStmt.
>>>
>>> 2. Use holdSnapshot in RunFromStore if any.
>>>
>>>
>>> The rerpducer is reduced to as small as the following.
>>>
>>> CREATE TABLE t (a text);
>>> INSERT INTO t VALUES('some random text');
>>> BEGIN;
>>> DECLARE c CURSOR FOR SELECT * FROM t;
>>> FETCH ALL IN c;
>>>
>>> But I haven't come up with a reasonable way to generate the 'some
>>> random text' yet.
>>
>> I gave up and took a straightforward way to generate one.
>>
>> I don't like that it uses a fixed length for the random text, but
>> anyway it works for now...
> 
> An shorter (?) reproducer might be the following, which forces any
> value for 'a' to be toasted and thus triggering the check in
> init_toast_snapshot regardless of value length:
> 
> CREATE TABLE t (a text);
> ALTER TABLE t ALTER COLUMN a SET STORAGE EXTERNAL;
> INSERT INTO t VALUES ('toast');
> BEGIN;
> DECLARE c CURSOR FOR SELECT * FROM t;
> FETCH ALL IN c;

Excellent. That indeed immediately forces the error.

(and the patch prevents it)

Thanks!


> 
> Enjoy,
> 
> -Matthias



At Tue, 29 Mar 2022 13:29:15 +0200, Erik Rijkers <er@xs4all.nl> wrote in 
> Op 29-03-2022 om 12:50 schreef Matthias van de Meent:
> > An shorter (?) reproducer might be the following, which forces any
> > value for 'a' to be toasted and thus triggering the check in
> > init_toast_snapshot regardless of value length:
> > CREATE TABLE t (a text);
> > ALTER TABLE t ALTER COLUMN a SET STORAGE EXTERNAL;
> > INSERT INTO t VALUES ('toast');
> > BEGIN;
> > DECLARE c CURSOR FOR SELECT * FROM t;
> > FETCH ALL IN c;

Yeah, unfortunately I tried that first and saw it didn't work. And it
still doesn't for me.  With such a short text pg_detoast_datum_pakced
doesn't call detoast_attr.  Actually it is VARATT_IS_1B. (@master)

I think I'm missing something here. I'm going to examine around.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



At Wed, 30 Mar 2022 10:06:17 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Tue, 29 Mar 2022 13:29:15 +0200, Erik Rijkers <er@xs4all.nl> wrote in 
> > Op 29-03-2022 om 12:50 schreef Matthias van de Meent:
> > > An shorter (?) reproducer might be the following, which forces any
> > > value for 'a' to be toasted and thus triggering the check in
> > > init_toast_snapshot regardless of value length:
> > > CREATE TABLE t (a text);
> > > ALTER TABLE t ALTER COLUMN a SET STORAGE EXTERNAL;
> > > INSERT INTO t VALUES ('toast');
> > > BEGIN;
> > > DECLARE c CURSOR FOR SELECT * FROM t;
> > > FETCH ALL IN c;
> 
> Yeah, unfortunately I tried that first and saw it didn't work. And it
> still doesn't for me.  With such a short text pg_detoast_datum_pakced
> doesn't call detoast_attr.  Actually it is VARATT_IS_1B. (@master)
> 
> I think I'm missing something here. I'm going to examine around.

Hmm. Strange. My memory tells that I did the same thing before.. I
thought that it is somewhat related to compression since repeat('x',
4096) didin't seem working at that time, but it worked this time.
Maybe I was confused between extended and external..

But, in the first place the *fix* has been found to be wrong.  I'm
going to search for the right fix..


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



At Wed, 30 Mar 2022 11:46:13 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> But, in the first place the *fix* has been found to be wrong.  I'm
> going to search for the right fix..

FETCH uses the snapshot at DECLARE. So anyhow I needed to set the
queryDesk's snapshot used in PortalRunSelect to the FETCH's portal's
holdSnapshot.  What I did in this version is:

1. Add a new member "snapshot" to the type DestReceiver.

2. In PortalRunSelect, set the DECLARE'd query's snapshot to the
  member iff the dest is tupelstore and the active snapshot is not
  set.

3. In FillPortalStore, copy the snapshot to the portal's holdSnapshot.

4. RunFromStore uses holdSnapshot if any.

I'm not still confident on this, but it should be better than the v1.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment
At Wed, 30 Mar 2022 17:58:24 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Wed, 30 Mar 2022 11:46:13 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > But, in the first place the *fix* has been found to be wrong.  I'm
> > going to search for the right fix..
> 
> FETCH uses the snapshot at DECLARE. So anyhow I needed to set the
> queryDesk's snapshot used in PortalRunSelect to the FETCH's portal's
> holdSnapshot.  What I did in this version is:

By the way, this is, given that the added check in init_toast_snapshot
is valid, a long-standing "bug", which at least back to 12.

I'm not sure what to do for this.

1. ignore the check for certain cases?

2. apply any fix only to master and call it a day.  14 and earlier
  doesn't have the assertion check so they don't complain.

3. apply a fix to all affected versions.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Wed, Mar 30, 2022 at 05:58:24PM +0900, Kyotaro Horiguchi wrote:
> I'm not still confident on this, but it should be better than the v1.

+Andres as this seems to be related to 277692220.
I added it as an Opened Item.



Hi,

On 2022-04-13 15:28:19 -0500, Justin Pryzby wrote:
> On Wed, Mar 30, 2022 at 05:58:24PM +0900, Kyotaro Horiguchi wrote:
> > I'm not still confident on this, but it should be better than the v1.
> 
> +Andres as this seems to be related to 277692220.

FWIW, that'd just mean it's an old bug that wasn't easily noticeable
before, not that it's the fault of 277692220.


> I added it as an Opened Item.

IOW, it'd belong in "Older bugs affecting stable branches".

Greetings,

Andres Freund



On Wed, Apr 13, 2022 at 8:38 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-04-13 15:28:19 -0500, Justin Pryzby wrote:
> > On Wed, Mar 30, 2022 at 05:58:24PM +0900, Kyotaro Horiguchi wrote:
> > > I'm not still confident on this, but it should be better than the v1.
> >
> > +Andres as this seems to be related to 277692220.
>
> FWIW, that'd just mean it's an old bug that wasn't easily noticeable
> before, not that it's the fault of 277692220.

I think you're still on the hook to do something about it for this
release. You could decide to revert the commit adding the assertion
and punt on doing thing about the underlying bug, but we can't just be
like "oh, an assertion is failing, we'll get to that someday".

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Apr 13, 2022 at 8:38 PM Andres Freund <andres@anarazel.de> wrote:
>> FWIW, that'd just mean it's an old bug that wasn't easily noticeable
>> before, not that it's the fault of 277692220.

> I think you're still on the hook to do something about it for this
> release.

I think you're trying to shoot the messenger.  As Andres says,
277692220 just exposes that there is some pre-existing bug here.
It's probably related to 84f5c2908, so I was planning to take a
look at it at some point, but there are a few other higher-priority
bugs in the way.

I see no point in reverting 277692220.  Removing the Assert would
prevent, or at least complicate, detection of other similar bugs.
And it'd do nothing to help end users, who won't have assertions
enabled anyway.

            regards, tom lane



On Thu, Apr 14, 2022 at 10:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Apr 13, 2022 at 8:38 PM Andres Freund <andres@anarazel.de> wrote:
> >> FWIW, that'd just mean it's an old bug that wasn't easily noticeable
> >> before, not that it's the fault of 277692220.
>
> > I think you're still on the hook to do something about it for this
> > release.
>
> I think you're trying to shoot the messenger.  As Andres says,
> 277692220 just exposes that there is some pre-existing bug here.
> It's probably related to 84f5c2908, so I was planning to take a
> look at it at some point, but there are a few other higher-priority
> bugs in the way.

Well, if you're willing to look at it that's fine, but I just don't
agree that it's OK to just commit things that add failing assertions
and drive off into the sunset. The code is always going to have a
certain number of unfixed bugs, and that's fine, and finding them is
good in itself, but people want to be able to run the software in the
meantime, and some of those people are developers or other individuals
who want to run it with assertions enabled. It's a judgement call
whether this assertion failure is going to bite enough people to be a
problem, but if it were something that happened easily enough to
prevent you from working on the source code, I'm sure you wouldn't be
OK with leaving it in there until someone got around to looking at it.
Given that it took about a month and a half for someone to report them
problem, it's not as bad as all that, but I guess I won't be surprised
if we keep getting complaints until something gets done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Hi,

On 2022-04-14 11:33:45 -0400, Robert Haas wrote:
> On Thu, Apr 14, 2022 at 10:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > On Wed, Apr 13, 2022 at 8:38 PM Andres Freund <andres@anarazel.de> wrote:
> > >> FWIW, that'd just mean it's an old bug that wasn't easily noticeable
> > >> before, not that it's the fault of 277692220.
> >
> > > I think you're still on the hook to do something about it for this
> > > release.
> >
> > I think you're trying to shoot the messenger.  As Andres says,
> > 277692220 just exposes that there is some pre-existing bug here.
> > It's probably related to 84f5c2908, so I was planning to take a
> > look at it at some point, but there are a few other higher-priority
> > bugs in the way.
> 
> Well, if you're willing to look at it that's fine, but I just don't
> agree that it's OK to just commit things that add failing assertions
> and drive off into the sunset.

I'm not planning to ride into the sunset / ignore this issue. All I said
is that it's imo not the right thing to say that that commit broke
things in 15. And that not a semantics game, because it means that the
fix needs to go back further than 277692220.

Greetings,

Andres Freund



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 14, 2022 at 10:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think you're trying to shoot the messenger.  As Andres says,
>> 277692220 just exposes that there is some pre-existing bug here.
>> It's probably related to 84f5c2908, so I was planning to take a
>> look at it at some point, but there are a few other higher-priority
>> bugs in the way.

> Well, if you're willing to look at it that's fine, but I just don't
> agree that it's OK to just commit things that add failing assertions
> and drive off into the sunset.

I don't think Andres had any intention of ignoring it indefinitely.
What he is doing is prioritizing it lower than several of the other
open items, an opinion I fully agree with.

            regards, tom lane



On Thu, Apr 14, 2022 at 11:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't think Andres had any intention of ignoring it indefinitely.
> What he is doing is prioritizing it lower than several of the other
> open items, an opinion I fully agree with.

Well, my point is that, historically, relegating things to the older
bugs section often means nothing happens, which I think would not be
great in this case. However, I'll try to shut up about the procedural
issues for the moment, since I seem to be in the minority.

I got curious and looked at the underlying problem here and I am
wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
seems to me that the code is always going to return true if there are
any active snapshots, and the rest of the function is intended to test
whether there is a registered snapshot other than the catalog
snapshot. But I don't think that's what this code does:

     if (pairingheap_is_empty(&RegisteredSnapshots) ||
         !pairingheap_is_singular(&RegisteredSnapshots))
         return false;

     return CatalogSnapshot == NULL;

So, if there are 0 active snapshots we return false. And if there is 1
active snapshot then we return true if there's no catalog snapshot and
otherwise false. So far so good. But if there are 2 or more registered
snapshots then the pairing heap will be neither empty nor singular so
it seems to me we will return false, which seems to me to be the wrong
answer. I tried this:

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index a0be0c411a..4e8e26a362 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1646,9 +1646,10 @@ HaveRegisteredOrActiveSnapshot(void)
      * removed at any time due to invalidation processing. If explicitly
      * registered more than one snapshot has to be in RegisteredSnapshots.
      */
-    if (pairingheap_is_empty(&RegisteredSnapshots) ||
-        !pairingheap_is_singular(&RegisteredSnapshots))
+    if (pairingheap_is_empty(&RegisteredSnapshots))
         return false;
+    if (!pairingheap_is_singular(&RegisteredSnapshots))
+        return true;

     return CatalogSnapshot == NULL;
 }

I find that 'make check-world' passes with this change, which is
disturbing, because it also passes without this change. That means we
don't have any tests that reach HaveRegisteredOrActiveSnapshot() with
more than one registered snapshot.

Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
more places, I think we should consider revising the whole approach
here. The way init_toast_snapshot() is coded, we basically have some
code that tests for active and registered snapshots and finds the
oldest one. We error out if there isn't one. And then we get to this
assertion, which checks the same stuff a second time but with an
additional check to see whether we ended up with the catalog snapshot.
Wouldn't it make more sense if GetOldestSnapshot() just refused to
return the catalog snapshot in the first place?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Hi,

On 2022-04-14 12:16:45 -0400, Robert Haas wrote:
> I got curious and looked at the underlying problem here and I am
> wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
> seems to me that the code is always going to return true if there are
> any active snapshots, and the rest of the function is intended to test
> whether there is a registered snapshot other than the catalog
> snapshot. But I don't think that's what this code does:
> 
>      if (pairingheap_is_empty(&RegisteredSnapshots) ||
>          !pairingheap_is_singular(&RegisteredSnapshots))
>          return false;
> 
>      return CatalogSnapshot == NULL;

Certainly looks off...


> I find that 'make check-world' passes with this change, which is
> disturbing, because it also passes without this change. That means we
> don't have any tests that reach HaveRegisteredOrActiveSnapshot() with
> more than one registered snapshot.

Part of that is because right now the assertion is placed "too deep" -
it should be much higher up, so it's reached even if there's not
actually a toast datum. But there's of other bugs preventing that :(. A
lot of bugs have been hidden by the existence of CatalogSnapshot (which
of course isn't something one actually can rely on).


> Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
> more places,

I think we should, but there's the other bugs that need to be fixed
first :(. Namely that we have plenty places doing catalog accesses
without an active or registered snapshot :(.


> I think we should consider revising the whole approach here. The way
> init_toast_snapshot() is coded, we basically have some code that tests
> for active and registered snapshots and finds the oldest one. We error
> out if there isn't one. And then we get to this assertion, which
> checks the same stuff a second time but with an additional check to
> see whether we ended up with the catalog snapshot.  Wouldn't it make
> more sense if GetOldestSnapshot() just refused to return the catalog
> snapshot in the first place?

I'm worried that that could cause additional bugs. Consider code using
GetOldestSnapshot() to check if tuples need to be preserved or such.

Greetings,

Andres Freund



Hi,

On 2022-04-14 09:54:25 -0700, Andres Freund wrote:
> On 2022-04-14 12:16:45 -0400, Robert Haas wrote:
> > I got curious and looked at the underlying problem here and I am
> > wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
> > seems to me that the code is always going to return true if there are
> > any active snapshots, and the rest of the function is intended to test
> > whether there is a registered snapshot other than the catalog
> > snapshot. But I don't think that's what this code does:
> > 
> >      if (pairingheap_is_empty(&RegisteredSnapshots) ||
> >          !pairingheap_is_singular(&RegisteredSnapshots))
> >          return false;
> > 
> >      return CatalogSnapshot == NULL;
> 
> Certainly looks off...
> 
> 
> > I find that 'make check-world' passes with this change, which is
> > disturbing, because it also passes without this change. That means we
> > don't have any tests that reach HaveRegisteredOrActiveSnapshot() with
> > more than one registered snapshot.
> 
> Part of that is because right now the assertion is placed "too deep" -
> it should be much higher up, so it's reached even if there's not
> actually a toast datum. But there's of other bugs preventing that :(. A
> lot of bugs have been hidden by the existence of CatalogSnapshot (which
> of course isn't something one actually can rely on).
> 
> 
> > Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
> > more places,
> 
> I think we should, but there's the other bugs that need to be fixed
> first :(. Namely that we have plenty places doing catalog accesses
> without an active or registered snapshot :(.

Ah, we actually were debating some of these issues more recently, in:
https://www.postgresql.org/message-id/20220311030721.olixpzcquqkw2qyt%40alap3.anarazel.de
https://www.postgresql.org/message-id/20220311021047.hgtqkrl6n52srvdu%40alap3.anarazel.de

It looks like the same bug, and that the patch in this thread fixes
them. And that we need to backpatch the fix, right?

Greetings,

Andres Freund



On Thu, Apr 14, 2022 at 12:54 PM Andres Freund <andres@anarazel.de> wrote:
> Part of that is because right now the assertion is placed "too deep" -
> it should be much higher up, so it's reached even if there's not
> actually a toast datum. But there's of other bugs preventing that :(. A
> lot of bugs have been hidden by the existence of CatalogSnapshot (which
> of course isn't something one actually can rely on).

I am wondering whether ffaa44cb559db332baeee7d25dedd74a61974203 bet on
the wrong horse. When I invented CatalogSnapshot, I intended for it to
be an ephemeral snapshot that we'd keep for as long as we could safely
do so and then discard it as soon as there's any hint of a problem.
But that commit really takes the opposite approach, trying to keep
CatalogSnapshot valid for a longer period of time instead of just
chucking it.

> > Also, unless we have plans to use HaveRegisteredOrActiveSnapshot() in
> > more places,
>
> I think we should, but there's the other bugs that need to be fixed
> first :(. Namely that we have plenty places doing catalog accesses
> without an active or registered snapshot :(.

Ouch.

> I'm worried that that could cause additional bugs. Consider code using
> GetOldestSnapshot() to check if tuples need to be preserved or such.

But there is no such code: GetOldestSnapshot() has only one caller.
And I don't think we'd add more just to do something like that,
because we have other mechanisms that are specifically designed for
testing whether tuples are prunable that are better suited to such
tasks. It should really only be used when you don't know which of the
backend's current snapshots ought to be used for something, but are
sure that using the oldest one will be good enough. And in that kind
of situation, it's hard to see why using the catalog snapshot would
ever be the right idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> I am wondering whether ffaa44cb559db332baeee7d25dedd74a61974203 bet on
> the wrong horse. When I invented CatalogSnapshot, I intended for it to
> be an ephemeral snapshot that we'd keep for as long as we could safely
> do so and then discard it as soon as there's any hint of a problem.
> But that commit really takes the opposite approach, trying to keep
> CatalogSnapshot valid for a longer period of time instead of just
> chucking it.

Not really following?  ISTM the point of ffaa44cb5 was that if we don't
include the CatalogSnapshot in our advertised xmin, then the length
of time we can safely use it is *zero*.

> But there is no such code: GetOldestSnapshot() has only one caller.
> And I don't think we'd add more just to do something like that,
> because we have other mechanisms that are specifically designed for
> testing whether tuples are prunable that are better suited to such
> tasks. It should really only be used when you don't know which of the
> backend's current snapshots ought to be used for something, but are
> sure that using the oldest one will be good enough. And in that kind
> of situation, it's hard to see why using the catalog snapshot would
> ever be the right idea.

What if the reason we need a snapshot is to detoast some toasted item
we read from a catalog with the CatalogSnapshot?  There might not be
any other valid snapshot, so I don't think I buy your argument here.

... Unless your argument is that the session should always have an
older non-catalog snapshot, which I'm not sure whether I buy or not.
But if we do believe that, then adding mechanisms to force it to be so
could be an alternative solution.  But why is that better than what
we're doing?

            regards, tom lane



Hi,

On 2022-04-14 13:36:51 -0400, Tom Lane wrote:
> What if the reason we need a snapshot is to detoast some toasted item
> we read from a catalog with the CatalogSnapshot?  There might not be
> any other valid snapshot, so I don't think I buy your argument here.

We definitely do have places doing that, but is it ever actually safe?
Part of the catalog access might cause cache invalidations to be
processed, which can invalidate the snapshot (including resetting
MyProc->xmin). Afaics we always would have push or register the
snapshot, either will copy the snapshot, I think?

Greetings,

Andres Freund



On Thu, Apr 14, 2022 at 1:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I am wondering whether ffaa44cb559db332baeee7d25dedd74a61974203 bet on
> > the wrong horse. When I invented CatalogSnapshot, I intended for it to
> > be an ephemeral snapshot that we'd keep for as long as we could safely
> > do so and then discard it as soon as there's any hint of a problem.
> > But that commit really takes the opposite approach, trying to keep
> > CatalogSnapshot valid for a longer period of time instead of just
> > chucking it.
>
> Not really following?  ISTM the point of ffaa44cb5 was that if we don't
> include the CatalogSnapshot in our advertised xmin, then the length
> of time we can safely use it is *zero*.

No, I don't think so. I'm proposing that you shouldn't be taking a
catalog snapshot unless you already hold some other snapshot, and that
the catalog snapshot should always be newer than whatever that other
snapshot is. If we made that be true, then your catalog snapshot
couldn't ever be the thing holding back xmin -- and then I think it
doesn't need to be registered.

> > But there is no such code: GetOldestSnapshot() has only one caller.
> > And I don't think we'd add more just to do something like that,
> > because we have other mechanisms that are specifically designed for
> > testing whether tuples are prunable that are better suited to such
> > tasks. It should really only be used when you don't know which of the
> > backend's current snapshots ought to be used for something, but are
> > sure that using the oldest one will be good enough. And in that kind
> > of situation, it's hard to see why using the catalog snapshot would
> > ever be the right idea.
>
> What if the reason we need a snapshot is to detoast some toasted item
> we read from a catalog with the CatalogSnapshot?  There might not be
> any other valid snapshot, so I don't think I buy your argument here.
>
> ... Unless your argument is that the session should always have an
> older non-catalog snapshot, which I'm not sure whether I buy or not.
> But if we do believe that, then adding mechanisms to force it to be so
> could be an alternative solution.  But why is that better than what
> we're doing?

That is exactly my argument, but I'm not actually sure whether it is
in fact better. I was responding to Andres's statement that
CatalogSnapshot was hiding a lot of bugs because it makes it look like
we have a snapshot when we don't really. And my point is that
ffaa44cb559db332baeee7d25dedd74a61974203 made such problems a lot more
likely, because before that the snapshot was not in
RegisteredSnapshots, and therefore anybody asking "hey, do we have a
snapshot?" wasn't likely to get confused, because they would only see
the catalog snapshot if they specifically went and looked at
CatalogSnapshot(Set), and if you do that, hopefully you'll think about
the special semantics of that snapshot and write code that works with
them. But with CatalogSnapshot in RegisteredSnapshots, any code that
looks through RegisteredSnapshots has to worry about whether what it's
finding there is actually just the CatalogSnapshot, and if Andres's
statement that we have a lot of bugs here is to be believed, then we
have not done a good job finding and updating all such code. We can
continue down the path of finding and fixing it -- or we can back out
parts of ffaa44cb559db332baeee7d25dedd74a61974203.

Just to be clear, I'm not debating that that commit fixed some real
problems and I think parts of it are really necessary fixes. But to
quote from the commit message:

    The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting
    for whether MyPgXact->xmin could be cleared or advanced.  In normal
    transactions this was masked by the fact that the transaction snapshot
    would be older, but during backend startup and certain utility commands
    it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had
    been cleared ...

And what I'm suggesting is that *perhaps* we ought to have fixed those
"during backend startup and certain utility commands" by having
SnapshotResetXmin() do InvalidateCatalogSnapshot(), and maybe also
made the code in those commands that's doing catalog lookups hold
acquire and hold a snapshot of its own around the operation. The
alternative you chose, namely, incorporating the xmin into the
backend's xmin computation, I think can also be made to work. I am
just not sure that it's the best approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Andres Freund <andres@anarazel.de> writes:
> On 2022-04-14 12:16:45 -0400, Robert Haas wrote:
>> I got curious and looked at the underlying problem here and I am
>> wondering whether HaveRegisteredOrActiveSnapshot() is just buggy. It
>> seems to me that the code is always going to return true if there are
>> any active snapshots, and the rest of the function is intended to test
>> whether there is a registered snapshot other than the catalog
>> snapshot. But I don't think that's what this code does:
>>
>> if (pairingheap_is_empty(&RegisteredSnapshots) ||
>>     !pairingheap_is_singular(&RegisteredSnapshots))
>>      return false;
>>
>> return CatalogSnapshot == NULL;

> Certainly looks off...

Yeah, this is broken.  Whilst waiting around for a build on wrasse's
host, I reproduced the problem shown in this thread, and here's what
I see at the point of the exception:

(gdb) p RegisteredSnapshots
$5 = {ph_compare = 0x9a6000 <xmin_cmp>, ph_arg = 0x0,
  ph_root = 0xec3168 <CatalogSnapshotData+72>}
(gdb) p *RegisteredSnapshots.ph_root
$6 = {first_child = 0x2d85d70, next_sibling = 0x0, prev_or_parent = 0x0}
(gdb) p CatalogSnapshotData
$7 = {snapshot_type = SNAPSHOT_MVCC, xmin = 52155, xmax = 52155,
  xip = 0x2d855b0, xcnt = 0, subxip = 0x2de9130, subxcnt = 0,
  suboverflowed = false, takenDuringRecovery = false, copied = false,
  curcid = 0, speculativeToken = 0, vistest = 0x0, active_count = 0,
  regd_count = 0, ph_node = {first_child = 0x2d85d70, next_sibling = 0x0,
    prev_or_parent = 0x0}, whenTaken = 0, lsn = 0, snapXactCompletionCount = 1}
(gdb) p CatalogSnapshot
$8 = (Snapshot) 0xec3120 <CatalogSnapshotData>
(gdb) p *(Snapshot) (0x2d85d70-72)
$9 = {snapshot_type = SNAPSHOT_MVCC, xmin = 52155, xmax = 52155, xip = 0x0,
  xcnt = 0, subxip = 0x0, subxcnt = 0, suboverflowed = false,
  takenDuringRecovery = false, copied = true, curcid = 0,
  speculativeToken = 0, vistest = 0x0, active_count = 0, regd_count = 2,
  ph_node = {first_child = 0x0, next_sibling = 0x0,
    prev_or_parent = 0xec3168 <CatalogSnapshotData+72>}, whenTaken = 0,
  lsn = 0, snapXactCompletionCount = 0}
(gdb) p ActiveSnapshot
$10 = (ActiveSnapshotElt *) 0x0

So in fact there IS another registered snapshot, and
HaveRegisteredOrActiveSnapshot is just lying.  I think the
correct test is more nearly what we have in
InvalidateCatalogSnapshotConditionally:

    if (CatalogSnapshot &&
        ActiveSnapshot == NULL &&
        pairingheap_is_singular(&RegisteredSnapshots))
            // then the CatalogSnapshot is the only one.

Ergo, this actually is a bug in 277692220.

            regards, tom lane



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 14, 2022 at 1:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Not really following?  ISTM the point of ffaa44cb5 was that if we don't
>> include the CatalogSnapshot in our advertised xmin, then the length
>> of time we can safely use it is *zero*.

> No, I don't think so. I'm proposing that you shouldn't be taking a
> catalog snapshot unless you already hold some other snapshot, and that
> the catalog snapshot should always be newer than whatever that other
> snapshot is. If we made that be true, then your catalog snapshot
> couldn't ever be the thing holding back xmin -- and then I think it
> doesn't need to be registered.

If you don't register it, then you need to also make sure that it's
destroyed whenever that older snapshot is.  Which I think would require
either a lot of useless/inefficient CatalogSnapshot destructions, or
infrastructure that's more or less isomorphic to the RegisteredSnapshot
heap.

> That is exactly my argument, but I'm not actually sure whether it is
> in fact better. I was responding to Andres's statement that
> CatalogSnapshot was hiding a lot of bugs because it makes it look like
> we have a snapshot when we don't really.

Well, we DO have a snapshot, and it is 100% perfectly safe to use, if it's
registered.  Andres' complaint is that that snapshot might get invalidated
when you weren't expecting it, but I'm not really convinced that we have
all that many bugs of that ilk.  Wouldn't CLOBBER_CACHE_ALWAYS testing
find them?

            regards, tom lane



On Thu, Apr 14, 2022 at 3:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If you don't register it, then you need to also make sure that it's
> destroyed whenever that older snapshot is.  Which I think would require
> either a lot of useless/inefficient CatalogSnapshot destructions, or
> infrastructure that's more or less isomorphic to the RegisteredSnapshot
> heap.

Well, if that's true, then I agree that it's a good argument against
that approach. But I guess I'm confused as to why we'd end up in that
situation. Suppose we do these two things:

1. Decree that SnapshotResetXmin calls InvalidateCatalogSnapshot. It's
the other way around right now, but that's only because we're
registering the catalog snapshot.
2. Bomb out in GetCatalogSnapshot if you don't have an active or
registered snapshot already.

Is there some reason we'd need any more infrastructure than that?

> Well, we DO have a snapshot, and it is 100% perfectly safe to use, if it's
> registered.  Andres' complaint is that that snapshot might get invalidated
> when you weren't expecting it, but I'm not really convinced that we have
> all that many bugs of that ilk.  Wouldn't CLOBBER_CACHE_ALWAYS testing
> find them?

Hmm, that's a good question. I don't know.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 14, 2022 at 3:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If you don't register it, then you need to also make sure that it's
>> destroyed whenever that older snapshot is.

> Well, if that's true, then I agree that it's a good argument against
> that approach. But I guess I'm confused as to why we'd end up in that
> situation. Suppose we do these two things:

> 1. Decree that SnapshotResetXmin calls InvalidateCatalogSnapshot. It's
> the other way around right now, but that's only because we're
> registering the catalog snapshot.
> 2. Bomb out in GetCatalogSnapshot if you don't have an active or
> registered snapshot already.

> Is there some reason we'd need any more infrastructure than that?

Yes.

1. Create snapshot 1 (beginning of transaction).
2. Create catalog snapshot (okay because of snapshot 1).
3. Create snapshot 2.
4. Destroy snapshot 1.
5. Catalog snapshot is still there and is now the oldest.

The implementation you propose would have to also forbid this sequence
of events, which is (a) difficult and (b) would add instability to the
system, since there's really no reason that this should be Not OK.

I'm basically not on board with adding complication to make the system
less performant and more brittle, and I don't see how the direction
you want to go isn't that.

(BTW, this thought experiment also puts a hole in the test added by
277692220: even if HaveRegisteredOrActiveSnapshot were doing what
it claims to do, it would allow use of the catalog snapshot for
detoasting after step 4, which I suppose is not what Andres intended.)

            regards, tom lane



Hi,

On 2022-04-14 15:05:50 -0400, Tom Lane wrote:
> Andres' complaint is that that snapshot might get invalidated when you
> weren't expecting it, but I'm not really convinced that we have all
> that many bugs of that ilk.  Wouldn't CLOBBER_CACHE_ALWAYS testing
> find them?

Don't see why it would - we don't have any mechanism in place for
enforcing that we don't update / delete a tuple we've looked up with an
xmin that wasn't continually enforced.  A typical pattern is to use a
catalog cache (registered an all) for a syscache lookup, but then not
have a registered / active snapshot until an eventual update / delete
(after the syscache scan ends). Which isn't safe, because without a
MyProc->xmin set, the tuple we're updating / deleting could be updated,
removed and replaced with another tuple.

Greetings,

Andres Freund



On Thu, Apr 14, 2022 at 4:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Well, if that's true, then I agree that it's a good argument against
> > that approach. But I guess I'm confused as to why we'd end up in that
> > situation. Suppose we do these two things:
>
> > 1. Decree that SnapshotResetXmin calls InvalidateCatalogSnapshot. It's
> > the other way around right now, but that's only because we're
> > registering the catalog snapshot.
> > 2. Bomb out in GetCatalogSnapshot if you don't have an active or
> > registered snapshot already.
>
> > Is there some reason we'd need any more infrastructure than that?
>
> Yes.
>
> 1. Create snapshot 1 (beginning of transaction).
> 2. Create catalog snapshot (okay because of snapshot 1).
> 3. Create snapshot 2.
> 4. Destroy snapshot 1.
> 5. Catalog snapshot is still there and is now the oldest.

Sorry, I'm not seeing the problem. If we call SnapshotResetXmin()
after step 4, then the catalog snapshot would get invalidated under my
proposal. If we don't, then our advertised xmin has not changed and
nothing can be pruned out from under us.

> I'm basically not on board with adding complication to make the system
> less performant and more brittle, and I don't see how the direction
> you want to go isn't that.

Well ... I agree that brittle is bad, but it's not clear to me which
way is actually less brittle. As for performant, I think you might be
misjudging the situation. My original design for removing SnapshotNow
didn't even have the catalog snapshot - it just took a new snapshot
every time. That was mostly fine, but Andres found a somewhat extreme
test case where it exhibited a significant regression, so I added the
catalog snapshot stuff to work around that. So I'm not AT ALL
convinced that giving catalog snapshots longer lifetimes is a relevant
thing to do. There's some value in it if you can construct a test case
where the overall rate of snapshot taking is extremely high, but in
normal cases that isn't so. It's certainly not worth complicating the
code for backend startup or DDL commands to reduce the number of
snapshots, because you're never going to have those things happening
at a high enough rate to matter, or so I believe.

The way you get a benefit from CatalogSnapshot is to construct a
workload with a lot of really cheap SQL statements each of which has
to do a bunch of catalog lookups, and then run that at high
concurrency. The concurrency makes taking snapshots more expensive,
because the cache lines are contended, and having the same command use
the same snapshot over and over again instead of taking new ones then
brings the cost down enough to be measurable. But people run SQL
statements in a tight loop, not DDL or backend startup.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> Well ... I agree that brittle is bad, but it's not clear to me which
> way is actually less brittle. As for performant, I think you might be
> misjudging the situation. My original design for removing SnapshotNow
> didn't even have the catalog snapshot - it just took a new snapshot
> every time. That was mostly fine, but Andres found a somewhat extreme
> test case where it exhibited a significant regression, so I added the
> catalog snapshot stuff to work around that. So I'm not AT ALL
> convinced that giving catalog snapshots longer lifetimes is a relevant
> thing to do.

Perhaps not.  But right now we have to first think about correctness and
how much trouble it will be to get to correctness.  ISTM you are arguing
for a design in which it's required that there is always a registered
or active snapshot that's older than the catalog snapshot (if any).
I tried revising snapmgr.c to enforce that, starting with adding 

@@ -421,6 +421,13 @@ GetNonHistoricCatalogSnapshot(Oid relid)
   
     if (CatalogSnapshot == NULL)
     {
+        /*
+         * The catalog snapshot must always be newer than some active or
+         * registered snapshot.  (XXX explain why)
+         */
+        Assert(ActiveSnapshot != NULL ||
+               !pairingheap_is_empty(&RegisteredSnapshots));
+
         /* Get new snapshot. */
         CatalogSnapshot = GetSnapshotData(&CatalogSnapshotData);
 
and this blew up with truly impressive thoroughness.  The autovac
launcher, logical replication launcher, and incoming backends all
fail this assertion instantly, making it impossible to find out
what else might be broken --- but I'm sure there is a lot.

(If you want to try this for yourself, remember that the postmaster
will relaunch the AV and LR launchers on failure, meaning that your
disk will fill with core files very very quickly.  Just sayin'.)

So maybe we can go that direction, but it's going to require a lot of
code additions to push extra snapshots in places that haven't bothered
to date; and I'm not convinced that that'd be buying us anything
except more GetSnapshotData() calls.

Plan B is to grant catalog snapshots some more-durable status than
what Plan A envisions.  I'm not very sure about the details, but if
you don't want to go that route then you'd better set about making
the above assertion work.

In the meantime, since it's clear that HaveRegisteredOrActiveSnapshot
is failing to meet its contract, I'm going to go fix it.  I think
(based on the above argument) that what it intends to enforce is not
really the system design we need, but it certainly isn't helping
anyone that it enforces that design incorrectly.

            regards, tom lane



Hi,

On 2022-04-16 14:42:39 -0400, Tom Lane wrote:
> In the meantime, since it's clear that HaveRegisteredOrActiveSnapshot
> is failing to meet its contract, I'm going to go fix it.

+1


> I think (based on the above argument) that what it intends to enforce
> is not really the system design we need, but it certainly isn't
> helping anyone that it enforces that design incorrectly.

I think it's approximately right for the current caller. But that caller
likely needs an improved design around snapshots...

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> On 2022-04-16 14:42:39 -0400, Tom Lane wrote:
>> I think (based on the above argument) that what it intends to enforce
>> is not really the system design we need, but it certainly isn't
>> helping anyone that it enforces that design incorrectly.

> I think it's approximately right for the current caller. But that caller
> likely needs an improved design around snapshots...

Yeah, I think the real issue is that checking
HaveRegisteredOrActiveSnapshot in this way doesn't provide a very
good guarantee of what we really want to know, which is that the
session's advertised xmin is old enough to prevent removal of
whatever toast data we're trying to fetch.  The fact that we
have a snapshot at the instant of fetch doesn't prove that it
existed continually since we fetched the toast reference, which
seems to be the condition we actually need to assure.  (And TBH
I see little reason to think that whether the snapshot is the
CatalogSnapshot or not changes things in any meaningful way.)

I don't yet see a practical way to check for the real concern.
While it's something to worry about, there's no reason to think
that v15 is any worse than prior versions in this area, is there?
So I'm inclined to remove this from the list of v15 open items,
or at least demote the remaining concern to "older bug" status.

            regards, tom lane



On Sat, Apr 16, 2022 at 2:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> and this blew up with truly impressive thoroughness.  The autovac
> launcher, logical replication launcher, and incoming backends all
> fail this assertion instantly, making it impossible to find out
> what else might be broken --- but I'm sure there is a lot.

OK, thanks for trying that.

> In the meantime, since it's clear that HaveRegisteredOrActiveSnapshot
> is failing to meet its contract, I'm going to go fix it.  I think
> (based on the above argument) that what it intends to enforce is not
> really the system design we need, but it certainly isn't helping
> anyone that it enforces that design incorrectly.

+1.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Hi,

On 2022-04-17 11:51:58 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-04-16 14:42:39 -0400, Tom Lane wrote:
> >> I think (based on the above argument) that what it intends to enforce
> >> is not really the system design we need, but it certainly isn't
> >> helping anyone that it enforces that design incorrectly.
> 
> > I think it's approximately right for the current caller. But that caller
> > likely needs an improved design around snapshots...
> 
> Yeah, I think the real issue is that checking
> HaveRegisteredOrActiveSnapshot in this way doesn't provide a very
> good guarantee of what we really want to know, which is that the
> session's advertised xmin is old enough to prevent removal of
> whatever toast data we're trying to fetch.

Right. It's better than what was there before though - I added
HaveRegisteredOrActiveSnapshot() in the course of
7c38ef2a5d6cf6d8dc3834399d7a1c364d64ce64. Where the problem was that we
didn't have *any* snapshot other than the catalog snapshot, and the
catalog snapshot only sometimes (iirc for that bug it depended on the
order in which objects were deleted).  That makes such bugs much harder
to detect.


> The fact that we have a snapshot at the instant of fetch doesn't prove
> that it existed continually since we fetched the toast reference,
> which seems to be the condition we actually need to assure.

Right.


> (And TBH I see little reason to think that whether the snapshot is the
> CatalogSnapshot or not changes things in any meaningful way.)

It is a meaningful difference, see e.g. the bug referenced above.


> I don't yet see a practical way to check for the real concern.
> While it's something to worry about, there's no reason to think
> that v15 is any worse than prior versions in this area, is there?
> So I'm inclined to remove this from the list of v15 open items,
> or at least demote the remaining concern to "older bug" status.

Yes.

Greetings,

Andres Freund



On Mon, Apr 18, 2022 at 10:39 AM Andres Freund <andres@anarazel.de> wrote:
> Right. It's better than what was there before though - I added
> HaveRegisteredOrActiveSnapshot() in the course of
> 7c38ef2a5d6cf6d8dc3834399d7a1c364d64ce64. Where the problem was that we
> didn't have *any* snapshot other than the catalog snapshot, and the
> catalog snapshot only sometimes (iirc for that bug it depended on the
> order in which objects were deleted).  That makes such bugs much harder
> to detect.

I still think it would be better to have GetOldestSnapshot() be
smarter and refuse to return the catalog snapshot. For one thing, that
way we'd be testing for the problem case in non-assert builds also.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Andres Freund <andres@anarazel.de> writes:
> On 2022-04-17 11:51:58 -0400, Tom Lane wrote:
>> The fact that we have a snapshot at the instant of fetch doesn't prove
>> that it existed continually since we fetched the toast reference,
>> which seems to be the condition we actually need to assure.

> Right.

>> (And TBH I see little reason to think that whether the snapshot is the
>> CatalogSnapshot or not changes things in any meaningful way.)

> It is a meaningful difference, see e.g. the bug referenced above.

Well, that's true given the current arrangements for managing
CatalogSnapshot; but that doesn't make the CatalogSnapshot any
less of a protection when it exists.  The direction I was vaguely
imagining is that we create some refcount-like infrastructure
directly ensuring that once a snapshot is used to read a toast
reference, it gets kept around until we dereference or discard
that reference.  With a scheme like that, there'd be no reason to
discriminate against a CatalogSnapshot as being the protective
snapshot.

(I hasten to add that I have no idea how to make this half-baked
plan work, and there may be better solutions anyway.)

>> While it's something to worry about, there's no reason to think
>> that v15 is any worse than prior versions in this area, is there?
>> So I'm inclined to remove this from the list of v15 open items,
>> or at least demote the remaining concern to "older bug" status.

> Yes.

OK, I'll update the open-items page.

            regards, tom lane



Robert Haas <robertmhaas@gmail.com> writes:
> I still think it would be better to have GetOldestSnapshot() be
> smarter and refuse to return the catalog snapshot. For one thing, that
> way we'd be testing for the problem case in non-assert builds also.

I was wondering about that too.  On the other hand, given that
we know this area is squishy, transforming fails-in-assert-builds
to fails-everywhere is not necessarily desirable.

            regards, tom lane



On Mon, Apr 18, 2022 at 10:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I still think it would be better to have GetOldestSnapshot() be
> > smarter and refuse to return the catalog snapshot. For one thing, that
> > way we'd be testing for the problem case in non-assert builds also.
>
> I was wondering about that too.  On the other hand, given that
> we know this area is squishy, transforming fails-in-assert-builds
> to fails-everywhere is not necessarily desirable.

I agree that it's a little unclear. In general, I think if we're going
to blow up and die, doing it closer to the place where the problem is
happening is for the best. On the other hand, if in most practical
cases we're going to stumble through and get the right answer anyway,
then it's maybe not great to break a bunch of accidentally-working
cases. However, it does strikes me that this principal could easily be
overdone. init_toast_snapshot() could pick a random snapshot (or take
a new one) in order to call InitToastSnapshot() and that would often
work fine. Yet, upon realizing that things are busted, it chooses to
error out instead. I approve of that choice, and don't think we should
rule out the idea of making that check more robust.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> I agree that it's a little unclear. In general, I think if we're going
> to blow up and die, doing it closer to the place where the problem is
> happening is for the best. On the other hand, if in most practical
> cases we're going to stumble through and get the right answer anyway,
> then it's maybe not great to break a bunch of accidentally-working
> cases. However, it does strikes me that this principal could easily be
> overdone. init_toast_snapshot() could pick a random snapshot (or take
> a new one) in order to call InitToastSnapshot() and that would often
> work fine. Yet, upon realizing that things are busted, it chooses to
> error out instead. I approve of that choice, and don't think we should
> rule out the idea of making that check more robust.

I'm all for improving robustness, but failing in cases that would have
worked before (even if only accidentally) is not going to be seen by
users as more robust.  I think that this late stage of the development
cycle is not the time to be putting in changes that are not actually
going to fix bugs but only call greater attention to the possibility
that a bug exists.

TBH, given where we are in the dev cycle, I thought there was a lot of
sense behind your earlier thought that HaveRegisteredOrActiveSnapshot
should be reverted entirely.  I'm okay with keeping it as an assertion-
only check, so that it won't bother end users.  I'm not okay with
adding end-user-visible failures, at least not till early in v16.

            regards, tom lane



On Mon, Apr 18, 2022 at 11:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I agree that it's a little unclear. In general, I think if we're going
> > to blow up and die, doing it closer to the place where the problem is
> > happening is for the best. On the other hand, if in most practical
> > cases we're going to stumble through and get the right answer anyway,
> > then it's maybe not great to break a bunch of accidentally-working
> > cases. However, it does strikes me that this principal could easily be
> > overdone. init_toast_snapshot() could pick a random snapshot (or take
> > a new one) in order to call InitToastSnapshot() and that would often
> > work fine. Yet, upon realizing that things are busted, it chooses to
> > error out instead. I approve of that choice, and don't think we should
> > rule out the idea of making that check more robust.
>
> I'm all for improving robustness, but failing in cases that would have
> worked before (even if only accidentally) is not going to be seen by
> users as more robust.  I think that this late stage of the development
> cycle is not the time to be putting in changes that are not actually
> going to fix bugs but only call greater attention to the possibility
> that a bug exists.
>
> TBH, given where we are in the dev cycle, I thought there was a lot of
> sense behind your earlier thought that HaveRegisteredOrActiveSnapshot
> should be reverted entirely.  I'm okay with keeping it as an assertion-
> only check, so that it won't bother end users.  I'm not okay with
> adding end-user-visible failures, at least not till early in v16.

I wasn't really taking a position either way about timing. If we can
demonstrate that things other than HaveRegisteredOrActiveSnapshot()
itself are misbehaving, then I think fixes for those bugs are
potentially back-patchable no matter where we are in the release
cycle, but in terms of when we make changes to try to detect bugs we
don't know about yet, I could go either way on whether to do that now
or wait. We can't know whether the bugs we haven't found yet will
cause a big problem for someone tomorrow, ten years from now, or
never.

I am not really very happy about HaveRegisteredOrActiveSnapshot(),
honestly. I think in the form we have it in the tree it looks
under-engineered. It's not really testing the right thing (even
leaving aside the bug fix) as you have been pointing out, it doesn't
really mesh well with the sanity checking that was there before as I
have been pointing out, and it's only used in one place. I wouldn't be
sad if it got reverted. However, I don't think it's going to do us any
great harm, either. Although it's a long way from the best thing we
have in the tree, it's also a long way from the worst thing we have in
the tree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> I wasn't really taking a position either way about timing. If we can
> demonstrate that things other than HaveRegisteredOrActiveSnapshot()
> itself are misbehaving, then I think fixes for those bugs are
> potentially back-patchable no matter where we are in the release
> cycle,

Sure, but ...

> but in terms of when we make changes to try to detect bugs we
> don't know about yet, I could go either way on whether to do that now
> or wait. We can't know whether the bugs we haven't found yet will
> cause a big problem for someone tomorrow, ten years from now, or
> never.

... I think in this case we do have a pretty good idea of the possible
consequences.  Most of the time, an unsafe toast fetch will work
fine because the toast data is still there.  If you're very unlucky
then it's been deleted, and vacuumed away, and then you get a "missing
chunk number" error.  If you're really astronomically unlucky, perhaps
the toast OID has been recycled and you get the wrong data (it's not
clear to me whether the toast snapshot visibility rules would prevent
this).  I doubt we need to factor that last scenario into practical risk
estimates, though.  So adding a non-assert check for snapshot misuse
would effectively convert "if you're very unlucky you get a weird error"
to "lucky or not, you get some other weird error", which no user is
going to see as an improvement.

> I am not really very happy about HaveRegisteredOrActiveSnapshot(),
> honestly.

Me either.  If we find any other cases where it gives a false positive,
I'll be for removing it rather than fixing it.  But for the moment
I'm content to leave it until we have a well-engineered solution to
the real problem.

            regards, tom lane



Andres Freund <andres@anarazel.de> writes:
> On 2022-04-17 11:51:58 -0400, Tom Lane wrote:
>> The fact that we have a snapshot at the instant of fetch doesn't prove
>> that it existed continually since we fetched the toast reference,
>> which seems to be the condition we actually need to assure.

> Right.

BTW, after thinking about this for a bit I am less concerned than
I was about the system being full of bugs of this ilk.  The executor
per se should be fine because it does everything under a live snapshot.
We had bugs with cases that shove executor output into long-lived
tuplestores, but we've dealt with that scenario.  Catalog updates
performed on tuples fetched from a catalog scan seem safe enough too.
Andres was worried about catalog updates performed using tuples fetched
from catcache, but that's not a problem because we detoasted every value
when it went into the catcache, cf 08e261cbc.

(Mind you, 08e261cbc's solution is risky performancewise, because it
means we have to re-toast every value during such catalog updates,
instead of being able to carry the old values of unchanged columns
forward.  But it's not a correctness bug.)

(Also, the whining I did in 08e261cbc's commit message is no longer
relevant now that we read catalogs with MVCC snapshots.)

There may be some corner cases that aren't described by any of these
three blanket scenarios, but they've got to be pretty few and far
between.

            regards, tom lane



On Mon, Apr 18, 2022 at 4:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There may be some corner cases that aren't described by any of these
> three blanket scenarios, but they've got to be pretty few and far
> between.

My first thought whenever anything like this comes up is cursors,
especially but not only holdable cursors. Also, plpgsql variables,
maybe mixed with embedded COMMIT/ROLLBACK. I don't find it
particularly hard to believe we have some bugs in
insufficiently-well-considered parts of the system that pass around
datums outside of the normal executor flow, but I don't know exactly
how to find them all, either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 18, 2022 at 4:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There may be some corner cases that aren't described by any of these
>> three blanket scenarios, but they've got to be pretty few and far
>> between.

> My first thought whenever anything like this comes up is cursors,
> especially but not only holdable cursors. Also, plpgsql variables,
> maybe mixed with embedded COMMIT/ROLLBACK.

Those exact cases have had detoasting bugs in the past and are now fixed.

> I don't find it
> particularly hard to believe we have some bugs in
> insufficiently-well-considered parts of the system that pass around
> datums outside of the normal executor flow, but I don't know exactly
> how to find them all, either.

I'm not here to claim that there are precisely zero remaining bugs
of this ilk.  I'm just saying that I think we've flushed out most
of them.  I think there is some value in trying to think of a way
to prove that none remain, but it's not a problem we can solve
for v15.

            regards, tom lane



On Tue, Apr 19, 2022 at 10:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not here to claim that there are precisely zero remaining bugs
> of this ilk.  I'm just saying that I think we've flushed out most
> of them.  I think there is some value in trying to think of a way
> to prove that none remain, but it's not a problem we can solve
> for v15.

Sure, that's fine.

-- 
Robert Haas
EDB: http://www.enterprisedb.com