Thread: Catalog invalidations vs catalog scans vs ScanPgRelation()

Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Andres Freund

Date:

29 February 2020, 08:24:59

Hi,

While self reviewing a patch I'm about to send I changed the assertion
in index_getnext_tid from
   Assert(TransactionIdIsValid(RecentGlobalXmin))
to instead test (via an indirection)
   Assert(TransactionIdIsValid(MyProc->xmin))

Without ->xmin being set, it's not safe to do scans. And
checking RecentGlobalXmin doesn't really do much.

But, uh. That doesn't get very far through the regression tests.

Turns out that I am to blame for that. All the way back in 9.4. For
logical decoding I needed to make ScanPgRelation() use a specific type
of snapshot during one corner case of logical decoding. For reasons lost
to time, I didn't continue to pass NULL to systable_beginscan() in the
plain case, but did an explicit GetCatalogSnapshot(RelationRelationId).
Note the missing RegisterSnapshot()...

That's bad because:

a) If invalidation processing triggers a InvalidateCatalogSnapshot(),
   the contained SnapshotResetXmin() may find no other snapshot, and
   reset ->xmin. Which then may cause relevant row versions to be
   removed.
b) If there's a subsequent GetCatalogSnapshot() during invalidation
   processing, that will GetSnapshotData() into the snapshot currently
   being used.

The fix itself is trivial, just pass NULL for the normal case, rather
than doing GetCatalogSnapshot().


But I think this points to some severe holes in relevant assertions /
infrastructure:

1) Asserting that RecentGlobalXmin is set - like many routines do -
   isn't meaningful, because it stays set even if SnapshotResetXmin()
   releases the transaction's snapshot. These are fairly old assertions
   (d53a56687f3d). As far as I can tell these routines really should
   verify that a snapshot is set.

2) I think we need to reset TransactionXmin, RecentXmin whenever
   SnapshotResetXmin() clears xmin. While we'll set them again the next
   time a snapshot is acquired, the fact that they stay set seems likely
   to hide bugs.

   We also could remove TransactionXmin and instead use the
   pgproc/pgxact's ->xmin. I don't really see the point of having it?

3) Similarly, I think we ought to reset reset RecentGlobal[Data]Xmin at
   the end of the transaction or such.

   But I'm not clear what protects those values from being affected by
   wraparound in a longrunning transaction? Initially they are obviously
   protected against that due to the procarray - but once the oldest
   procarray entry releases its snapshot, the global xmin horizon can
   advance. That allows transactions that up to ~2 billion into the
   future of the current backend, whereas RecentGlobalXmin might be
   nearly ~2 billion transactions in the past relative to ->xmin.

   That might not have been a likely problem many years ago, but seems
   far from impossible today?


I propose to commit a fix, but then also add an assertion for
TransactionIdIsValid(MyPgXact->xmin) instead (or in addition) to the
TransactionIdIsValid(RecentGlobalXmin) tests right now. And in master
clear the various *Xmin variables whenever we reset xmin.

I think in master we should also start to make RecentGlobalXmin etc
FullTransactionIds. We can then convert the 32bit xids we compare with
RecentGlobal* to 64bit xids (which is safe, because live xids have to be
within [oldestXid, nextXid)).  I have that as part of another patch
anyway...


Greetings,

Andres Freund

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Andres Freund

Date:

29 February 2020, 09:10:52

Hi,

On 2020-02-28 21:24:59 -0800, Andres Freund wrote:
> Turns out that I am to blame for that. All the way back in 9.4. For
> logical decoding I needed to make ScanPgRelation() use a specific type
> of snapshot during one corner case of logical decoding. For reasons lost
> to time, I didn't continue to pass NULL to systable_beginscan() in the
> plain case, but did an explicit GetCatalogSnapshot(RelationRelationId).
> Note the missing RegisterSnapshot()...

Oh, I pierced through the veil: It's likely because the removal of
SnapshotNow happened concurrently to the development of logical
decoding. Before using proper snapshot for catalog scans passing in
SnapshotNow was precisely the right thing to do...

I think that's somewhat unlikely to actively cause problems in practice,
as ScanPgRelation() requires that we already have a lock on the
relation, we only look for a single row, and I don't think we rely on
the result's tid to be correct. I don't immediately see a case where
this would trigger in a problematic way.


After fixing the ScanPgRelation case I found another occurance of the
problem:
The catalogsnapshot copied at (ignore the slight off line numbers please):

#0  GetCatalogSnapshot (relid=1249) at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:454
#1  0x0000560d725b198d in systable_beginscan (heapRelation=0x7f13429f2a08, indexId=2659, indexOK=true, snapshot=0x0,
nkeys=2,key=0x7fff26e04db0)
 
    at /home/andres/src/postgresql/src/backend/access/index/genam.c:378
#2  0x0000560d72b4117f in SearchCatCacheMiss (cache=0x560d74590800, nkeys=2, hashValue=1029784422, hashIndex=102,
v1=697088,v2=3, v3=0, v4=0)
 
    at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1359
#3  0x0000560d72b41045 in SearchCatCacheInternal (cache=0x560d74590800, nkeys=2, v1=697088, v2=3, v3=0, v4=0)
    at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1299
#4  0x0000560d72b40d09 in SearchCatCache (cache=0x560d74590800, v1=697088, v2=3, v3=0, v4=0)
    at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1153
#5  0x0000560d72b5b65f in SearchSysCache (cacheId=7, key1=697088, key2=3, key3=0, key4=0)
    at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1112
#6  0x0000560d72b5b9dd in SearchSysCacheCopy (cacheId=7, key1=697088, key2=3, key3=0, key4=0)
    at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1187
#7  0x0000560d72645501 in RemoveAttrDefaultById (attrdefId=697096) at
/home/andres/src/postgresql/src/backend/catalog/heap.c:1821
#8  0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at
/home/andres/src/postgresql/src/backend/catalog/dependency.c:1397
#9  0x0000560d7263fa17 in deleteOneObject (object=0x560d745d37a0, depRel=0x7fff26e052d0, flags=29)
    at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1261
#10 0x0000560d7263e4d6 in deleteObjectsInList (targetObjects=0x560d745d1ec0, depRel=0x7fff26e052d0, flags=29)
    at /home/andres/src/postgresql/src/backend/catalog/dependency.c:271
#11 0x0000560d7263e58a in performDeletion (object=0x7fff26e05304, behavior=DROP_CASCADE, flags=29)
    at /home/andres/src/postgresql/src/backend/catalog/dependency.c:356
#12 0x0000560d72655f3f in RemoveTempRelations (tempNamespaceId=686167) at
/home/andres/src/postgresql/src/backend/catalog/namespace.c:4155
#13 0x0000560d72655f72 in RemoveTempRelationsCallback (code=0, arg=0) at
/home/andres/src/postgresql/src/backend/catalog/namespace.c:4174
#14 0x0000560d729a58e2 in shmem_exit (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:239
#15 0x0000560d729a57b5 in proc_exit_prepare (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:194

is released at the end of systable_endscan. And then xmin is reset at:

#0  SnapshotResetXmin () at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:1038
#1  0x0000560d72bb9bfc in InvalidateCatalogSnapshot () at
/home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:521
#2  0x0000560d72b43d62 in LocalExecuteInvalidationMessage (msg=0x7fff26e04e70) at
/home/andres/src/postgresql/src/backend/utils/cache/inval.c:562
#3  0x0000560d729b277b in ReceiveSharedInvalidMessages (invalFunction=0x560d72b43d26 <LocalExecuteInvalidationMessage>,

    resetFunction=0x560d72b43f92 <InvalidateSystemCaches>) at
/home/andres/src/postgresql/src/backend/storage/ipc/sinval.c:120
#4  0x0000560d72b44070 in AcceptInvalidationMessages () at
/home/andres/src/postgresql/src/backend/utils/cache/inval.c:683
#5  0x0000560d729b8f4f in LockRelationOid (relid=2658, lockmode=3) at
/home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:136
#6  0x0000560d725341e3 in relation_open (relationId=2658, lockmode=3) at
/home/andres/src/postgresql/src/backend/access/common/relation.c:56
#7  0x0000560d725b22c3 in index_open (relationId=2658, lockmode=3) at
/home/andres/src/postgresql/src/backend/access/index/indexam.c:130
#8  0x0000560d727b6991 in ExecOpenIndices (resultRelInfo=0x560d7468ffa0, speculative=false)
    at /home/andres/src/postgresql/src/backend/executor/execIndexing.c:199
#9  0x0000560d7264fc7e in CatalogOpenIndexes (heapRel=0x7f13429f2a08) at
/home/andres/src/postgresql/src/backend/catalog/indexing.c:51
#10 0x0000560d72650010 in CatalogTupleUpdate (heapRel=0x7f13429f2a08, otid=0x560d746901d4, tup=0x560d746901d0)
    at /home/andres/src/postgresql/src/backend/catalog/indexing.c:228
#11 0x0000560d72645583 in RemoveAttrDefaultById (attrdefId=697096) at
/home/andres/src/postgresql/src/backend/catalog/heap.c:1830
#12 0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at
/home/andres/src/postgresql/src/backend/catalog/dependency.c:1397

which then hits an assertion at:

#2  0x0000560d725a4e86 in heap_page_prune_opt (relation=0x7f13429f2a08, buffer=3153) at
/home/andres/src/postgresql/src/backend/access/heap/pruneheap.c:131
#3  0x0000560d7259a5b3 in heapam_index_fetch_tuple (scan=0x560d746912c8, tid=0x7fff26e04c5a, snapshot=0x7fff26e04c60,
slot=0x560d745d1ef8,
 
    call_again=0x7fff26e04ade, all_dead=0x7fff26e04c59) at
/home/andres/src/postgresql/src/backend/access/heap/heapam_handler.c:137
#4  0x0000560d725eeedc in table_index_fetch_tuple (scan=0x560d746912c8, tid=0x7fff26e04c5a, snapshot=0x7fff26e04c60,
slot=0x560d745d1ef8,
 
    call_again=0x7fff26e04ade, all_dead=0x7fff26e04c59) at
/home/andres/src/postgresql/src/include/access/tableam.h:1020
#5  0x0000560d725ef478 in table_index_fetch_tuple_check (rel=0x7f13429f2a08, tid=0x7fff26e04c5a,
snapshot=0x7fff26e04c60,all_dead=0x7fff26e04c59)
 
    at /home/andres/src/postgresql/src/backend/access/table/tableam.c:213
#6  0x0000560d725b4ef7 in _bt_check_unique (rel=0x7f1342a29ce0, insertstate=0x7fff26e04d90, heapRel=0x7f13429f2a08,
checkUnique=UNIQUE_CHECK_YES,
 
    is_unique=0x7fff26e04dc1, speculativeToken=0x7fff26e04d88) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:452
#7  0x0000560d725b48e0 in _bt_doinsert (rel=0x7f1342a29ce0, itup=0x560d746892b0, checkUnique=UNIQUE_CHECK_YES,
heapRel=0x7f13429f2a08)
    at /home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:247
#8  0x0000560d725c0167 in btinsert (rel=0x7f1342a29ce0, values=0x7fff26e04ee0, isnull=0x7fff26e04ec0,
ht_ctid=0x560d746901d4,heapRel=0x7f13429f2a08, 
 
    checkUnique=UNIQUE_CHECK_YES, indexInfo=0x560d7468f3d8) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtree.c:207
#9  0x0000560d725b2530 in index_insert (indexRelation=0x7f1342a29ce0, values=0x7fff26e04ee0, isnull=0x7fff26e04ec0,
heap_t_ctid=0x560d746901d4,
 
    heapRelation=0x7f13429f2a08, checkUnique=UNIQUE_CHECK_YES, indexInfo=0x560d7468f3d8) at
/home/andres/src/postgresql/src/backend/access/index/indexam.c:186
#10 0x0000560d7264ff34 in CatalogIndexInsert (indstate=0x560d7468ffa0, heapTuple=0x560d746901d0)
    at /home/andres/src/postgresql/src/backend/catalog/indexing.c:157
#11 0x0000560d7265003e in CatalogTupleUpdate (heapRel=0x7f13429f2a08, otid=0x560d746901d4, tup=0x560d746901d0)
    at /home/andres/src/postgresql/src/backend/catalog/indexing.c:232
#12 0x0000560d72645583 in RemoveAttrDefaultById (attrdefId=697096) at
/home/andres/src/postgresql/src/backend/catalog/heap.c:1830
#13 0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at
/home/andres/src/postgresql/src/backend/catalog/dependency.c:1397

So, um. What happens is that doDeletion() does a catalog scan, which
sets a snapshot. The results of that catalog scan are then used to
perform modifications. But at that point there's no guarantee that we
still hold *any* snapshot, as e.g. invalidations can trigger the catalog
snapshot being released.

I don't see how that's safe. Without ->xmin preventing that,
intermediate row versions that we did look up could just get vacuumed
away, and replaced with a different row. That does seem like a serious
issue?

I think there's likely a lot of places that can hit this? What makes it
safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
->xmin when there previously has been *any* catalog access? Because in
contrast to normal table modifications, there's currently nothing at all
forcing us to hold a snapshot between catalog lookups an their
modifications?

Am I missing something? Or is this a fairly significant hole in our
arrangements?

The easiest way to fix this would probably be to have inval.c call a
version of InvalidateCatalogSnapshot() that leaves the oldest catalog
snapshot around, but sets up things so that GetCatalogSnapshot() will
return a freshly taken snapshot?  ISTM that pretty much every
InvalidateCatalogSnapshot() call within a transaction needs that behaviour?

Regards,

Andres

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Andres Freund

Date:

29 February 2020, 23:17:07

Hi,

On 2020-02-28 22:10:52 -0800, Andres Freund wrote:
> So, um. What happens is that doDeletion() does a catalog scan, which
> sets a snapshot. The results of that catalog scan are then used to
> perform modifications. But at that point there's no guarantee that we
> still hold *any* snapshot, as e.g. invalidations can trigger the catalog
> snapshot being released.
> 
> I don't see how that's safe. Without ->xmin preventing that,
> intermediate row versions that we did look up could just get vacuumed
> away, and replaced with a different row. That does seem like a serious
> issue?
> 
> I think there's likely a lot of places that can hit this? What makes it
> safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
> ->xmin when there previously has been *any* catalog access? Because in
> contrast to normal table modifications, there's currently nothing at all
> forcing us to hold a snapshot between catalog lookups an their
> modifications?
> 
> Am I missing something? Or is this a fairly significant hole in our
> arrangements?

I still think that's true.  In a first iteration I hacked around the
problem by explicitly registering a catalog snapshot in
RemoveTempRelations(). That *sometimes* allows to get through the
regression tests without the assertions triggering.

But I don't think that's good enough (even if we fixed the other
potential crashes similarly). The only reason that avoids the asserts is
because in nearly all other cases there's also a user snapshot that's
pushed. But that pushed snapshot can have an xmin that's newer than the
catalog snapshot, which means we're still in danger of tids from catalog
scans being outdated.

My preliminary conclusion is that it's simply not safe to do
SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
to defer the SnapshotResetXmin() call until at least
CommitTransactionCommand()? Outside of that there ought (with exception
of multi-transaction commands, but they have to be careful anyway) to be
no "in progress" sequences of related catalog lookups/modifications.

Alternatively we could ensure that all catalog lookup/mod sequences
ensure that the first catalog snapshot is registered. But that seems
like a gargantuan task?

> The easiest way to fix this would probably be to have inval.c call a
> version of InvalidateCatalogSnapshot() that leaves the oldest catalog
> snapshot around, but sets up things so that GetCatalogSnapshot() will
> return a freshly taken snapshot?  ISTM that pretty much every
> InvalidateCatalogSnapshot() call within a transaction needs that behaviour?

A related question is in which cases is it actually safe to use a
snapshot that's not registered, nor pushed as the active snapshot.
snapmgr.c just provides:

* Note that the return value may point at static storage that will be modified
 * by future calls and by CommandCounterIncrement().  Callers should call
 * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be
 * used very long.

but doesn't clarify what 'very long' means. As far as I can tell,
there's very little that actually safe. It's probably ok to do a single
visiblity test, but anything that e.g. has a chance of accepting
invalidations is entirely unsafe?

Greetings,

Andres Freund

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Andres Freund

Date:

28 March 2020, 22:30:40

Hi,

On 2020-02-28 22:10:52 -0800, Andres Freund wrote:
> On 2020-02-28 21:24:59 -0800, Andres Freund wrote:
> > Turns out that I am to blame for that. All the way back in 9.4. For
> > logical decoding I needed to make ScanPgRelation() use a specific type
> > of snapshot during one corner case of logical decoding. For reasons lost
> > to time, I didn't continue to pass NULL to systable_beginscan() in the
> > plain case, but did an explicit GetCatalogSnapshot(RelationRelationId).
> > Note the missing RegisterSnapshot()...
> 
> Oh, I pierced through the veil: It's likely because the removal of
> SnapshotNow happened concurrently to the development of logical
> decoding. Before using proper snapshot for catalog scans passing in
> SnapshotNow was precisely the right thing to do...
> 
> I think that's somewhat unlikely to actively cause problems in practice,
> as ScanPgRelation() requires that we already have a lock on the
> relation, we only look for a single row, and I don't think we rely on
> the result's tid to be correct. I don't immediately see a case where
> this would trigger in a problematic way.

Pushed a fix for this.


> So, um. What happens is that doDeletion() does a catalog scan, which
> sets a snapshot. The results of that catalog scan are then used to
> perform modifications. But at that point there's no guarantee that we
> still hold *any* snapshot, as e.g. invalidations can trigger the catalog
> snapshot being released.
> 
> I don't see how that's safe. Without ->xmin preventing that,
> intermediate row versions that we did look up could just get vacuumed
> away, and replaced with a different row. That does seem like a serious
> issue?
> 
> I think there's likely a lot of places that can hit this? What makes it
> safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
> ->xmin when there previously has been *any* catalog access? Because in
> contrast to normal table modifications, there's currently nothing at all
> forcing us to hold a snapshot between catalog lookups an their
> modifications?
> 
> Am I missing something? Or is this a fairly significant hole in our
> arrangements?
> 
> The easiest way to fix this would probably be to have inval.c call a
> version of InvalidateCatalogSnapshot() that leaves the oldest catalog
> snapshot around, but sets up things so that GetCatalogSnapshot() will
> return a freshly taken snapshot?  ISTM that pretty much every
> InvalidateCatalogSnapshot() call within a transaction needs that behaviour?

I'd still like to get some input here.

Greetings,

Andres Freund

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Andres Freund

Date:

28 March 2020, 23:54:10

Hi,

On 2020-03-28 12:30:40 -0700, Andres Freund wrote:
> On 2020-02-28 22:10:52 -0800, Andres Freund wrote:
> > On 2020-02-28 21:24:59 -0800, Andres Freund wrote:
> > So, um. What happens is that doDeletion() does a catalog scan, which
> > sets a snapshot. The results of that catalog scan are then used to
> > perform modifications. But at that point there's no guarantee that we
> > still hold *any* snapshot, as e.g. invalidations can trigger the catalog
> > snapshot being released.
> > 
> > I don't see how that's safe. Without ->xmin preventing that,
> > intermediate row versions that we did look up could just get vacuumed
> > away, and replaced with a different row. That does seem like a serious
> > issue?
> > 
> > I think there's likely a lot of places that can hit this? What makes it
> > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
> > ->xmin when there previously has been *any* catalog access? Because in
> > contrast to normal table modifications, there's currently nothing at all
> > forcing us to hold a snapshot between catalog lookups an their
> > modifications?
> > 
> > Am I missing something? Or is this a fairly significant hole in our
> > arrangements?
> > 
> > The easiest way to fix this would probably be to have inval.c call a
> > version of InvalidateCatalogSnapshot() that leaves the oldest catalog
> > snapshot around, but sets up things so that GetCatalogSnapshot() will
> > return a freshly taken snapshot?  ISTM that pretty much every
> > InvalidateCatalogSnapshot() call within a transaction needs that behaviour?
> 
> I'd still like to get some input here.

Attached is a one patch that adds assertions to detect this, and one
that puts enough workarounds in place to make the tests pass. I don't
like this much, but I thought it'd be useful for others to understand
the problem.

Greetings,

Andres Freund

Attachment

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Andres Freund

Date:

07 April 2020, 10:24:18

Hi,

Robert, Tom, it'd be great if you could look through this thread. I
think there's a problem here (and it has gotten worse after the
introduction of catalog snapshots). Both of you at least dabbled in
related code.


On 2020-02-29 12:17:07 -0800, Andres Freund wrote:
> On 2020-02-28 22:10:52 -0800, Andres Freund wrote:
> > So, um. What happens is that doDeletion() does a catalog scan, which
> > sets a snapshot. The results of that catalog scan are then used to
> > perform modifications. But at that point there's no guarantee that we
> > still hold *any* snapshot, as e.g. invalidations can trigger the catalog
> > snapshot being released.
> > 
> > I don't see how that's safe. Without ->xmin preventing that,
> > intermediate row versions that we did look up could just get vacuumed
> > away, and replaced with a different row. That does seem like a serious
> > issue?
> > 
> > I think there's likely a lot of places that can hit this? What makes it
> > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release
> > ->xmin when there previously has been *any* catalog access? Because in
> > contrast to normal table modifications, there's currently nothing at all
> > forcing us to hold a snapshot between catalog lookups an their
> > modifications?
> > 
> > Am I missing something? Or is this a fairly significant hole in our
> > arrangements?
> 
> I still think that's true.  In a first iteration I hacked around the
> problem by explicitly registering a catalog snapshot in
> RemoveTempRelations(). That *sometimes* allows to get through the
> regression tests without the assertions triggering.

The attached two patches (they're not meant to be applied) reliably get
through the regression tests. But I suspect I'd have to at least do a
CLOBBER_CACHE_ALWAYS run to find all the actually vulnerable places.


> But I don't think that's good enough (even if we fixed the other
> potential crashes similarly). The only reason that avoids the asserts is
> because in nearly all other cases there's also a user snapshot that's
> pushed. But that pushed snapshot can have an xmin that's newer than the
> catalog snapshot, which means we're still in danger of tids from catalog
> scans being outdated.
> 
> My preliminary conclusion is that it's simply not safe to do
> SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
> PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
> to defer the SnapshotResetXmin() call until at least
> CommitTransactionCommand()? Outside of that there ought (with exception
> of multi-transaction commands, but they have to be careful anyway) to be
> no "in progress" sequences of related catalog lookups/modifications.
> 
> Alternatively we could ensure that all catalog lookup/mod sequences
> ensure that the first catalog snapshot is registered. But that seems
> like a gargantuan task?

I also just noticed comments of this style in a few places
     * Start a transaction so we can access pg_database, and get a snapshot.
     * We don't have a use for the snapshot itself, but we're interested in
     * the secondary effect that it sets RecentGlobalXmin.  (This is critical
     * for anything that reads heap pages, because HOT may decide to prune
     * them even if the process doesn't attempt to modify any tuples.)
followed by code like

    StartTransactionCommand();
    (void) GetTransactionSnapshot();

    rel = table_open(DatabaseRelationId, AccessShareLock);
    scan = table_beginscan_catalog(rel, 0, NULL);

which is not safe at all, unfortunately. The snapshot is not
pushed/active, therefore invalidations processed e.g. as part of the
table_open() could execute a InvalidateCatalogSnapshot(), which in turn
would remove the catalog snapshot from the pairing heap and
SnapshotResetXmin().  And poof, the backend's xmin is gone.

Greetings,

Andres Freund

Attachment

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Robert Haas

Date:

09 April 2020, 23:56:03

[ belatedly responding ]

On Sat, Feb 29, 2020 at 3:17 PM Andres Freund <andres@anarazel.de> wrote:
> My preliminary conclusion is that it's simply not safe to do
> SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
> PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
> to defer the SnapshotResetXmin() call until at least
> CommitTransactionCommand()? Outside of that there ought (with exception
> of multi-transaction commands, but they have to be careful anyway) to be
> no "in progress" sequences of related catalog lookups/modifications.
>
> Alternatively we could ensure that all catalog lookup/mod sequences
> ensure that the first catalog snapshot is registered. But that seems
> like a gargantuan task?

If I understand correctly, the scenario you're concerned about is
something like this:

(1) Transaction #1 reads a catalog tuple and immediately releases its snapshot.
(2) Transaction #2 performs a DELETE or UPDATE on that catalog tuple.
(3) Transaction #3 completes a VACUUM on the table, so that the old
tuple is pruned, thus marked dead, and then the TID is marked unused.
(4) Transaction #4 performs an INSERT which reuses the same TID.
(5) Transaction #1 now performs a DELETE or UPDATE using the previous
TID and updates the unrelated tuple which reused the TID rather than
the intended tuple.

It seems to me that what is supposed to prevent this from happening is
that you aren't supposed to release your snapshot at the end of step
#1. You're supposed to hold onto it until after step #5 is complete. I
think that there are fair number of places that are already careful
about that. I just picked a random source file that I knew Tom had
written and found this bit in extension_config_remove:

        extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
                                                                 NULL, 1, key);

        extTup = systable_getnext(extScan);
...a lot more stuff...
        CatalogTupleUpdate(extRel, &extTup->t_self, extTup);

        systable_endscan(extScan);

Quite apart from this issue, there's a very good reason why it's like
that: extTup might be pointing right into a disk buffer, and if we did
systable_endscan() before the last access to it, our pointer could
become invalid. A fair number of places are protected due to the scan
being kept open like this, but it looks like most of the ones that use
SearchSysCacheCopyX + CatalogTupleUpdate are problematic.

I would be inclined to fix this problem by adjusting those places to
keep a snapshot open rather than by making some arbitrary rule about
holding onto a catalog snapshot until the end of the command. That
seems like a fairly magical coding rule that will happen to work in
most practical cases but isn't really a principled approach to the
problem. Besides being magical, it's also fragile: just deciding to
use a some other snapshot instead of the catalog snapshot causes your
code to be subtly broken in a way you're surely not going to expect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Andres Freund

Date:

10 April 2020, 01:32:49

Hi,

On 2020-04-09 16:56:03 -0400, Robert Haas wrote:
> [ belatedly responding ]
> 
> On Sat, Feb 29, 2020 at 3:17 PM Andres Freund <andres@anarazel.de> wrote:
> > My preliminary conclusion is that it's simply not safe to do
> > SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
> > PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
> > to defer the SnapshotResetXmin() call until at least
> > CommitTransactionCommand()? Outside of that there ought (with exception
> > of multi-transaction commands, but they have to be careful anyway) to be
> > no "in progress" sequences of related catalog lookups/modifications.
> >
> > Alternatively we could ensure that all catalog lookup/mod sequences
> > ensure that the first catalog snapshot is registered. But that seems
> > like a gargantuan task?
> 
> If I understand correctly, the scenario you're concerned about is
> something like this:
> 
> (1) Transaction #1 reads a catalog tuple and immediately releases its snapshot.
> (2) Transaction #2 performs a DELETE or UPDATE on that catalog tuple.
> (3) Transaction #3 completes a VACUUM on the table, so that the old
> tuple is pruned, thus marked dead, and then the TID is marked unused.
> (4) Transaction #4 performs an INSERT which reuses the same TID.
> (5) Transaction #1 now performs a DELETE or UPDATE using the previous
> TID and updates the unrelated tuple which reused the TID rather than
> the intended tuple.

Pretty much.

I think it's enough for 3) and 4) to happen in quite that way. If 3) is
just HOT pruned away, or 3) happens but 4) doesn't, we'd still be in
trouble:
Currently heap_update/delete has no non-assert check that the
passed in TID is an existing tuple.

    lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
    Assert(ItemIdIsNormal(lp));
..
    oldtup.t_tableOid = RelationGetRelid(relation);
    oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
    oldtup.t_len = ItemIdGetLength(lp);
    oldtup.t_self = *otid;
...
    modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
                                                  &oldtup, newtup);

so we'll treat the page header as a tuple. Not likely to end well.



> It seems to me that what is supposed to prevent this from happening is
> that you aren't supposed to release your snapshot at the end of step
> #1. You're supposed to hold onto it until after step #5 is complete. I
> think that there are fair number of places that are already careful
> about that. I just picked a random source file that I knew Tom had
> written and found this bit in extension_config_remove:
> 
>         extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
>                                                                  NULL, 1, key);
> 
>         extTup = systable_getnext(extScan);
> ...a lot more stuff...
>         CatalogTupleUpdate(extRel, &extTup->t_self, extTup);
> 
>         systable_endscan(extScan);
> 
> Quite apart from this issue, there's a very good reason why it's like
> that: extTup might be pointing right into a disk buffer, and if we did
> systable_endscan() before the last access to it, our pointer could
> become invalid. A fair number of places are protected due to the scan
> being kept open like this, but it looks like most of the ones that use
> SearchSysCacheCopyX + CatalogTupleUpdate are problematic.

Indeed. There's unfortunately quite a few of those.  There's also a few
places, most prominently probably performMultipleDeletions(), that
explicitly do searches, and then afterwards perform deletions - without
holding a snapshot.


> I would be inclined to fix this problem by adjusting those places to
> keep a snapshot open rather than by making some arbitrary rule about
> holding onto a catalog snapshot until the end of the command.

That's what my prototype patch did. It's doable, although we would need
more complete assertions than I had added to ensure we're not
introducing more broken places.

While my patch did that, for correctness I don't think it can just be
something like
    snap = RegisterSnapshot(GetLatestSnapshot());
or
    PushActiveSnapshot(GetTransactionSnapshot());

as neither willbe the catalog snapshot, which could be older than
GetLatestSnapshot()/GetTransactionSnapshot(). But IIRC we also can't
just register the catalog snapshot, because some parts of the system
will use a "normal" snapshot instead (which could be older).


> That seems like a fairly magical coding rule that will happen to work
> in most practical cases but isn't really a principled approach to the
> problem.

I'm not sure it'd be that magical to only release resources at
CommitTransactionCommand() time. We kinda do that for a few other things
already.


> Besides being magical, it's also fragile: just deciding to
> use a some other snapshot instead of the catalog snapshot causes your
> code to be subtly broken in a way you're surely not going to expect.

That's actually kind of an argument the other way for me: Because there
can be multiple snapshots, and because it is hard to check that the same
snapshot is held across lookup & update, it seems more robust to not
reset the xmin in the middle of a command.

Greetings,

Andres Freund

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Tom Lane

Date:

10 April 2020, 01:52:32

Andres Freund <andres@anarazel.de> writes:
> On 2020-04-09 16:56:03 -0400, Robert Haas wrote:
>> That seems like a fairly magical coding rule that will happen to work
>> in most practical cases but isn't really a principled approach to the
>> problem.

> I'm not sure it'd be that magical to only release resources at
> CommitTransactionCommand() time. We kinda do that for a few other things
> already.

I'd be worried about consumption of resources during a long transaction.
But maybe we could release at CommandCounterIncrement?

Still, I tend to agree with Robert that associating a snap with an
open catalog scan is the right way.  I have vague memories that a long
time ago, all catalog modifications were done via the fetch-from-a-
scan-and-update approach.  Starting from a catcache tuple instead
is a relative newbie.

If we're going to forbid using a catcache tuple as the starting point
for updates, one way to enforce it would be to have the catcache
not save the TID.

            regards, tom lane

Re: Catalog invalidations vs catalog scans vs ScanPgRelation()

From

Andres Freund

Date:

10 April 2020, 04:20:10

Hi,

On 2020-04-09 18:52:32 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2020-04-09 16:56:03 -0400, Robert Haas wrote:
> >> That seems like a fairly magical coding rule that will happen to work
> >> in most practical cases but isn't really a principled approach to the
> >> problem.
> 
> > I'm not sure it'd be that magical to only release resources at
> > CommitTransactionCommand() time. We kinda do that for a few other things
> > already.
> 
> I'd be worried about consumption of resources during a long transaction.
> But maybe we could release at CommandCounterIncrement?

Which resources are you thinking of? SnapshotResetXmin() shouldn't take
any directly. Obviously it can cause bloat - but where would we use a
snapshot for only some part of a command, but need not have xmin
preserved till the end of the command?

> Still, I tend to agree with Robert that associating a snap with an
> open catalog scan is the right way.

I'm wondering if we should do both. I think releasing xmin in the middle
of a command, but only when invalidations arrive in the middle of it, is
pretty likely to be involved in bugs in the future. But it also seems
good to ensure that snapshots are held across relevant operations.

Any idea how to deal with different types of snapshots potentially being
used within such a sequence?

> I have vague memories that a long time ago, all catalog modifications
> were done via the fetch-from-a- scan-and-update approach.  Starting
> from a catcache tuple instead is a relative newbie. If we're going to
> forbid using a catcache tuple as the starting point for updates, one
> way to enforce it would be to have the catcache not save the TID.

I suspect that that'd be fairly painful code-churn wise.

Greetings,

Andres Freund