Thread: Catalog invalidations vs catalog scans vs ScanPgRelation()
Hi, While self reviewing a patch I'm about to send I changed the assertion in index_getnext_tid from Assert(TransactionIdIsValid(RecentGlobalXmin)) to instead test (via an indirection) Assert(TransactionIdIsValid(MyProc->xmin)) Without ->xmin being set, it's not safe to do scans. And checking RecentGlobalXmin doesn't really do much. But, uh. That doesn't get very far through the regression tests. Turns out that I am to blame for that. All the way back in 9.4. For logical decoding I needed to make ScanPgRelation() use a specific type of snapshot during one corner case of logical decoding. For reasons lost to time, I didn't continue to pass NULL to systable_beginscan() in the plain case, but did an explicit GetCatalogSnapshot(RelationRelationId). Note the missing RegisterSnapshot()... That's bad because: a) If invalidation processing triggers a InvalidateCatalogSnapshot(), the contained SnapshotResetXmin() may find no other snapshot, and reset ->xmin. Which then may cause relevant row versions to be removed. b) If there's a subsequent GetCatalogSnapshot() during invalidation processing, that will GetSnapshotData() into the snapshot currently being used. The fix itself is trivial, just pass NULL for the normal case, rather than doing GetCatalogSnapshot(). But I think this points to some severe holes in relevant assertions / infrastructure: 1) Asserting that RecentGlobalXmin is set - like many routines do - isn't meaningful, because it stays set even if SnapshotResetXmin() releases the transaction's snapshot. These are fairly old assertions (d53a56687f3d). As far as I can tell these routines really should verify that a snapshot is set. 2) I think we need to reset TransactionXmin, RecentXmin whenever SnapshotResetXmin() clears xmin. While we'll set them again the next time a snapshot is acquired, the fact that they stay set seems likely to hide bugs. We also could remove TransactionXmin and instead use the pgproc/pgxact's ->xmin. I don't really see the point of having it? 3) Similarly, I think we ought to reset reset RecentGlobal[Data]Xmin at the end of the transaction or such. But I'm not clear what protects those values from being affected by wraparound in a longrunning transaction? Initially they are obviously protected against that due to the procarray - but once the oldest procarray entry releases its snapshot, the global xmin horizon can advance. That allows transactions that up to ~2 billion into the future of the current backend, whereas RecentGlobalXmin might be nearly ~2 billion transactions in the past relative to ->xmin. That might not have been a likely problem many years ago, but seems far from impossible today? I propose to commit a fix, but then also add an assertion for TransactionIdIsValid(MyPgXact->xmin) instead (or in addition) to the TransactionIdIsValid(RecentGlobalXmin) tests right now. And in master clear the various *Xmin variables whenever we reset xmin. I think in master we should also start to make RecentGlobalXmin etc FullTransactionIds. We can then convert the 32bit xids we compare with RecentGlobal* to 64bit xids (which is safe, because live xids have to be within [oldestXid, nextXid)). I have that as part of another patch anyway... Greetings, Andres Freund
Hi, On 2020-02-28 21:24:59 -0800, Andres Freund wrote: > Turns out that I am to blame for that. All the way back in 9.4. For > logical decoding I needed to make ScanPgRelation() use a specific type > of snapshot during one corner case of logical decoding. For reasons lost > to time, I didn't continue to pass NULL to systable_beginscan() in the > plain case, but did an explicit GetCatalogSnapshot(RelationRelationId). > Note the missing RegisterSnapshot()... Oh, I pierced through the veil: It's likely because the removal of SnapshotNow happened concurrently to the development of logical decoding. Before using proper snapshot for catalog scans passing in SnapshotNow was precisely the right thing to do... I think that's somewhat unlikely to actively cause problems in practice, as ScanPgRelation() requires that we already have a lock on the relation, we only look for a single row, and I don't think we rely on the result's tid to be correct. I don't immediately see a case where this would trigger in a problematic way. After fixing the ScanPgRelation case I found another occurance of the problem: The catalogsnapshot copied at (ignore the slight off line numbers please): #0 GetCatalogSnapshot (relid=1249) at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:454 #1 0x0000560d725b198d in systable_beginscan (heapRelation=0x7f13429f2a08, indexId=2659, indexOK=true, snapshot=0x0, nkeys=2,key=0x7fff26e04db0) at /home/andres/src/postgresql/src/backend/access/index/genam.c:378 #2 0x0000560d72b4117f in SearchCatCacheMiss (cache=0x560d74590800, nkeys=2, hashValue=1029784422, hashIndex=102, v1=697088,v2=3, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1359 #3 0x0000560d72b41045 in SearchCatCacheInternal (cache=0x560d74590800, nkeys=2, v1=697088, v2=3, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1299 #4 0x0000560d72b40d09 in SearchCatCache (cache=0x560d74590800, v1=697088, v2=3, v3=0, v4=0) at /home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1153 #5 0x0000560d72b5b65f in SearchSysCache (cacheId=7, key1=697088, key2=3, key3=0, key4=0) at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1112 #6 0x0000560d72b5b9dd in SearchSysCacheCopy (cacheId=7, key1=697088, key2=3, key3=0, key4=0) at /home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1187 #7 0x0000560d72645501 in RemoveAttrDefaultById (attrdefId=697096) at /home/andres/src/postgresql/src/backend/catalog/heap.c:1821 #8 0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1397 #9 0x0000560d7263fa17 in deleteOneObject (object=0x560d745d37a0, depRel=0x7fff26e052d0, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1261 #10 0x0000560d7263e4d6 in deleteObjectsInList (targetObjects=0x560d745d1ec0, depRel=0x7fff26e052d0, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:271 #11 0x0000560d7263e58a in performDeletion (object=0x7fff26e05304, behavior=DROP_CASCADE, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:356 #12 0x0000560d72655f3f in RemoveTempRelations (tempNamespaceId=686167) at /home/andres/src/postgresql/src/backend/catalog/namespace.c:4155 #13 0x0000560d72655f72 in RemoveTempRelationsCallback (code=0, arg=0) at /home/andres/src/postgresql/src/backend/catalog/namespace.c:4174 #14 0x0000560d729a58e2 in shmem_exit (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:239 #15 0x0000560d729a57b5 in proc_exit_prepare (code=0) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:194 is released at the end of systable_endscan. And then xmin is reset at: #0 SnapshotResetXmin () at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:1038 #1 0x0000560d72bb9bfc in InvalidateCatalogSnapshot () at /home/andres/src/postgresql/src/backend/utils/time/snapmgr.c:521 #2 0x0000560d72b43d62 in LocalExecuteInvalidationMessage (msg=0x7fff26e04e70) at /home/andres/src/postgresql/src/backend/utils/cache/inval.c:562 #3 0x0000560d729b277b in ReceiveSharedInvalidMessages (invalFunction=0x560d72b43d26 <LocalExecuteInvalidationMessage>, resetFunction=0x560d72b43f92 <InvalidateSystemCaches>) at /home/andres/src/postgresql/src/backend/storage/ipc/sinval.c:120 #4 0x0000560d72b44070 in AcceptInvalidationMessages () at /home/andres/src/postgresql/src/backend/utils/cache/inval.c:683 #5 0x0000560d729b8f4f in LockRelationOid (relid=2658, lockmode=3) at /home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:136 #6 0x0000560d725341e3 in relation_open (relationId=2658, lockmode=3) at /home/andres/src/postgresql/src/backend/access/common/relation.c:56 #7 0x0000560d725b22c3 in index_open (relationId=2658, lockmode=3) at /home/andres/src/postgresql/src/backend/access/index/indexam.c:130 #8 0x0000560d727b6991 in ExecOpenIndices (resultRelInfo=0x560d7468ffa0, speculative=false) at /home/andres/src/postgresql/src/backend/executor/execIndexing.c:199 #9 0x0000560d7264fc7e in CatalogOpenIndexes (heapRel=0x7f13429f2a08) at /home/andres/src/postgresql/src/backend/catalog/indexing.c:51 #10 0x0000560d72650010 in CatalogTupleUpdate (heapRel=0x7f13429f2a08, otid=0x560d746901d4, tup=0x560d746901d0) at /home/andres/src/postgresql/src/backend/catalog/indexing.c:228 #11 0x0000560d72645583 in RemoveAttrDefaultById (attrdefId=697096) at /home/andres/src/postgresql/src/backend/catalog/heap.c:1830 #12 0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1397 which then hits an assertion at: #2 0x0000560d725a4e86 in heap_page_prune_opt (relation=0x7f13429f2a08, buffer=3153) at /home/andres/src/postgresql/src/backend/access/heap/pruneheap.c:131 #3 0x0000560d7259a5b3 in heapam_index_fetch_tuple (scan=0x560d746912c8, tid=0x7fff26e04c5a, snapshot=0x7fff26e04c60, slot=0x560d745d1ef8, call_again=0x7fff26e04ade, all_dead=0x7fff26e04c59) at /home/andres/src/postgresql/src/backend/access/heap/heapam_handler.c:137 #4 0x0000560d725eeedc in table_index_fetch_tuple (scan=0x560d746912c8, tid=0x7fff26e04c5a, snapshot=0x7fff26e04c60, slot=0x560d745d1ef8, call_again=0x7fff26e04ade, all_dead=0x7fff26e04c59) at /home/andres/src/postgresql/src/include/access/tableam.h:1020 #5 0x0000560d725ef478 in table_index_fetch_tuple_check (rel=0x7f13429f2a08, tid=0x7fff26e04c5a, snapshot=0x7fff26e04c60,all_dead=0x7fff26e04c59) at /home/andres/src/postgresql/src/backend/access/table/tableam.c:213 #6 0x0000560d725b4ef7 in _bt_check_unique (rel=0x7f1342a29ce0, insertstate=0x7fff26e04d90, heapRel=0x7f13429f2a08, checkUnique=UNIQUE_CHECK_YES, is_unique=0x7fff26e04dc1, speculativeToken=0x7fff26e04d88) at /home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:452 #7 0x0000560d725b48e0 in _bt_doinsert (rel=0x7f1342a29ce0, itup=0x560d746892b0, checkUnique=UNIQUE_CHECK_YES, heapRel=0x7f13429f2a08) at /home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:247 #8 0x0000560d725c0167 in btinsert (rel=0x7f1342a29ce0, values=0x7fff26e04ee0, isnull=0x7fff26e04ec0, ht_ctid=0x560d746901d4,heapRel=0x7f13429f2a08, checkUnique=UNIQUE_CHECK_YES, indexInfo=0x560d7468f3d8) at /home/andres/src/postgresql/src/backend/access/nbtree/nbtree.c:207 #9 0x0000560d725b2530 in index_insert (indexRelation=0x7f1342a29ce0, values=0x7fff26e04ee0, isnull=0x7fff26e04ec0, heap_t_ctid=0x560d746901d4, heapRelation=0x7f13429f2a08, checkUnique=UNIQUE_CHECK_YES, indexInfo=0x560d7468f3d8) at /home/andres/src/postgresql/src/backend/access/index/indexam.c:186 #10 0x0000560d7264ff34 in CatalogIndexInsert (indstate=0x560d7468ffa0, heapTuple=0x560d746901d0) at /home/andres/src/postgresql/src/backend/catalog/indexing.c:157 #11 0x0000560d7265003e in CatalogTupleUpdate (heapRel=0x7f13429f2a08, otid=0x560d746901d4, tup=0x560d746901d0) at /home/andres/src/postgresql/src/backend/catalog/indexing.c:232 #12 0x0000560d72645583 in RemoveAttrDefaultById (attrdefId=697096) at /home/andres/src/postgresql/src/backend/catalog/heap.c:1830 #13 0x0000560d7263fd6b in doDeletion (object=0x560d745d37a0, flags=29) at /home/andres/src/postgresql/src/backend/catalog/dependency.c:1397 So, um. What happens is that doDeletion() does a catalog scan, which sets a snapshot. The results of that catalog scan are then used to perform modifications. But at that point there's no guarantee that we still hold *any* snapshot, as e.g. invalidations can trigger the catalog snapshot being released. I don't see how that's safe. Without ->xmin preventing that, intermediate row versions that we did look up could just get vacuumed away, and replaced with a different row. That does seem like a serious issue? I think there's likely a lot of places that can hit this? What makes it safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release ->xmin when there previously has been *any* catalog access? Because in contrast to normal table modifications, there's currently nothing at all forcing us to hold a snapshot between catalog lookups an their modifications? Am I missing something? Or is this a fairly significant hole in our arrangements? The easiest way to fix this would probably be to have inval.c call a version of InvalidateCatalogSnapshot() that leaves the oldest catalog snapshot around, but sets up things so that GetCatalogSnapshot() will return a freshly taken snapshot? ISTM that pretty much every InvalidateCatalogSnapshot() call within a transaction needs that behaviour? Regards, Andres
Hi, On 2020-02-28 22:10:52 -0800, Andres Freund wrote: > So, um. What happens is that doDeletion() does a catalog scan, which > sets a snapshot. The results of that catalog scan are then used to > perform modifications. But at that point there's no guarantee that we > still hold *any* snapshot, as e.g. invalidations can trigger the catalog > snapshot being released. > > I don't see how that's safe. Without ->xmin preventing that, > intermediate row versions that we did look up could just get vacuumed > away, and replaced with a different row. That does seem like a serious > issue? > > I think there's likely a lot of places that can hit this? What makes it > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release > ->xmin when there previously has been *any* catalog access? Because in > contrast to normal table modifications, there's currently nothing at all > forcing us to hold a snapshot between catalog lookups an their > modifications? > > Am I missing something? Or is this a fairly significant hole in our > arrangements? I still think that's true. In a first iteration I hacked around the problem by explicitly registering a catalog snapshot in RemoveTempRelations(). That *sometimes* allows to get through the regression tests without the assertions triggering. But I don't think that's good enough (even if we fixed the other potential crashes similarly). The only reason that avoids the asserts is because in nearly all other cases there's also a user snapshot that's pushed. But that pushed snapshot can have an xmin that's newer than the catalog snapshot, which means we're still in danger of tids from catalog scans being outdated. My preliminary conclusion is that it's simply not safe to do SnapshotResetXmin() from within InvalidateCatalogSnapshot(), PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need to defer the SnapshotResetXmin() call until at least CommitTransactionCommand()? Outside of that there ought (with exception of multi-transaction commands, but they have to be careful anyway) to be no "in progress" sequences of related catalog lookups/modifications. Alternatively we could ensure that all catalog lookup/mod sequences ensure that the first catalog snapshot is registered. But that seems like a gargantuan task? > The easiest way to fix this would probably be to have inval.c call a > version of InvalidateCatalogSnapshot() that leaves the oldest catalog > snapshot around, but sets up things so that GetCatalogSnapshot() will > return a freshly taken snapshot? ISTM that pretty much every > InvalidateCatalogSnapshot() call within a transaction needs that behaviour? A related question is in which cases is it actually safe to use a snapshot that's not registered, nor pushed as the active snapshot. snapmgr.c just provides: * Note that the return value may point at static storage that will be modified * by future calls and by CommandCounterIncrement(). Callers should call * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be * used very long. but doesn't clarify what 'very long' means. As far as I can tell, there's very little that actually safe. It's probably ok to do a single visiblity test, but anything that e.g. has a chance of accepting invalidations is entirely unsafe? Greetings, Andres Freund
Hi, On 2020-02-28 22:10:52 -0800, Andres Freund wrote: > On 2020-02-28 21:24:59 -0800, Andres Freund wrote: > > Turns out that I am to blame for that. All the way back in 9.4. For > > logical decoding I needed to make ScanPgRelation() use a specific type > > of snapshot during one corner case of logical decoding. For reasons lost > > to time, I didn't continue to pass NULL to systable_beginscan() in the > > plain case, but did an explicit GetCatalogSnapshot(RelationRelationId). > > Note the missing RegisterSnapshot()... > > Oh, I pierced through the veil: It's likely because the removal of > SnapshotNow happened concurrently to the development of logical > decoding. Before using proper snapshot for catalog scans passing in > SnapshotNow was precisely the right thing to do... > > I think that's somewhat unlikely to actively cause problems in practice, > as ScanPgRelation() requires that we already have a lock on the > relation, we only look for a single row, and I don't think we rely on > the result's tid to be correct. I don't immediately see a case where > this would trigger in a problematic way. Pushed a fix for this. > So, um. What happens is that doDeletion() does a catalog scan, which > sets a snapshot. The results of that catalog scan are then used to > perform modifications. But at that point there's no guarantee that we > still hold *any* snapshot, as e.g. invalidations can trigger the catalog > snapshot being released. > > I don't see how that's safe. Without ->xmin preventing that, > intermediate row versions that we did look up could just get vacuumed > away, and replaced with a different row. That does seem like a serious > issue? > > I think there's likely a lot of places that can hit this? What makes it > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release > ->xmin when there previously has been *any* catalog access? Because in > contrast to normal table modifications, there's currently nothing at all > forcing us to hold a snapshot between catalog lookups an their > modifications? > > Am I missing something? Or is this a fairly significant hole in our > arrangements? > > The easiest way to fix this would probably be to have inval.c call a > version of InvalidateCatalogSnapshot() that leaves the oldest catalog > snapshot around, but sets up things so that GetCatalogSnapshot() will > return a freshly taken snapshot? ISTM that pretty much every > InvalidateCatalogSnapshot() call within a transaction needs that behaviour? I'd still like to get some input here. Greetings, Andres Freund
Hi, On 2020-03-28 12:30:40 -0700, Andres Freund wrote: > On 2020-02-28 22:10:52 -0800, Andres Freund wrote: > > On 2020-02-28 21:24:59 -0800, Andres Freund wrote: > > So, um. What happens is that doDeletion() does a catalog scan, which > > sets a snapshot. The results of that catalog scan are then used to > > perform modifications. But at that point there's no guarantee that we > > still hold *any* snapshot, as e.g. invalidations can trigger the catalog > > snapshot being released. > > > > I don't see how that's safe. Without ->xmin preventing that, > > intermediate row versions that we did look up could just get vacuumed > > away, and replaced with a different row. That does seem like a serious > > issue? > > > > I think there's likely a lot of places that can hit this? What makes it > > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release > > ->xmin when there previously has been *any* catalog access? Because in > > contrast to normal table modifications, there's currently nothing at all > > forcing us to hold a snapshot between catalog lookups an their > > modifications? > > > > Am I missing something? Or is this a fairly significant hole in our > > arrangements? > > > > The easiest way to fix this would probably be to have inval.c call a > > version of InvalidateCatalogSnapshot() that leaves the oldest catalog > > snapshot around, but sets up things so that GetCatalogSnapshot() will > > return a freshly taken snapshot? ISTM that pretty much every > > InvalidateCatalogSnapshot() call within a transaction needs that behaviour? > > I'd still like to get some input here. Attached is a one patch that adds assertions to detect this, and one that puts enough workarounds in place to make the tests pass. I don't like this much, but I thought it'd be useful for others to understand the problem. Greetings, Andres Freund
Attachment
Hi, Robert, Tom, it'd be great if you could look through this thread. I think there's a problem here (and it has gotten worse after the introduction of catalog snapshots). Both of you at least dabbled in related code. On 2020-02-29 12:17:07 -0800, Andres Freund wrote: > On 2020-02-28 22:10:52 -0800, Andres Freund wrote: > > So, um. What happens is that doDeletion() does a catalog scan, which > > sets a snapshot. The results of that catalog scan are then used to > > perform modifications. But at that point there's no guarantee that we > > still hold *any* snapshot, as e.g. invalidations can trigger the catalog > > snapshot being released. > > > > I don't see how that's safe. Without ->xmin preventing that, > > intermediate row versions that we did look up could just get vacuumed > > away, and replaced with a different row. That does seem like a serious > > issue? > > > > I think there's likely a lot of places that can hit this? What makes it > > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release > > ->xmin when there previously has been *any* catalog access? Because in > > contrast to normal table modifications, there's currently nothing at all > > forcing us to hold a snapshot between catalog lookups an their > > modifications? > > > > Am I missing something? Or is this a fairly significant hole in our > > arrangements? > > I still think that's true. In a first iteration I hacked around the > problem by explicitly registering a catalog snapshot in > RemoveTempRelations(). That *sometimes* allows to get through the > regression tests without the assertions triggering. The attached two patches (they're not meant to be applied) reliably get through the regression tests. But I suspect I'd have to at least do a CLOBBER_CACHE_ALWAYS run to find all the actually vulnerable places. > But I don't think that's good enough (even if we fixed the other > potential crashes similarly). The only reason that avoids the asserts is > because in nearly all other cases there's also a user snapshot that's > pushed. But that pushed snapshot can have an xmin that's newer than the > catalog snapshot, which means we're still in danger of tids from catalog > scans being outdated. > > My preliminary conclusion is that it's simply not safe to do > SnapshotResetXmin() from within InvalidateCatalogSnapshot(), > PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need > to defer the SnapshotResetXmin() call until at least > CommitTransactionCommand()? Outside of that there ought (with exception > of multi-transaction commands, but they have to be careful anyway) to be > no "in progress" sequences of related catalog lookups/modifications. > > Alternatively we could ensure that all catalog lookup/mod sequences > ensure that the first catalog snapshot is registered. But that seems > like a gargantuan task? I also just noticed comments of this style in a few places * Start a transaction so we can access pg_database, and get a snapshot. * We don't have a use for the snapshot itself, but we're interested in * the secondary effect that it sets RecentGlobalXmin. (This is critical * for anything that reads heap pages, because HOT may decide to prune * them even if the process doesn't attempt to modify any tuples.) followed by code like StartTransactionCommand(); (void) GetTransactionSnapshot(); rel = table_open(DatabaseRelationId, AccessShareLock); scan = table_beginscan_catalog(rel, 0, NULL); which is not safe at all, unfortunately. The snapshot is not pushed/active, therefore invalidations processed e.g. as part of the table_open() could execute a InvalidateCatalogSnapshot(), which in turn would remove the catalog snapshot from the pairing heap and SnapshotResetXmin(). And poof, the backend's xmin is gone. Greetings, Andres Freund
Attachment
[ belatedly responding ] On Sat, Feb 29, 2020 at 3:17 PM Andres Freund <andres@anarazel.de> wrote: > My preliminary conclusion is that it's simply not safe to do > SnapshotResetXmin() from within InvalidateCatalogSnapshot(), > PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need > to defer the SnapshotResetXmin() call until at least > CommitTransactionCommand()? Outside of that there ought (with exception > of multi-transaction commands, but they have to be careful anyway) to be > no "in progress" sequences of related catalog lookups/modifications. > > Alternatively we could ensure that all catalog lookup/mod sequences > ensure that the first catalog snapshot is registered. But that seems > like a gargantuan task? If I understand correctly, the scenario you're concerned about is something like this: (1) Transaction #1 reads a catalog tuple and immediately releases its snapshot. (2) Transaction #2 performs a DELETE or UPDATE on that catalog tuple. (3) Transaction #3 completes a VACUUM on the table, so that the old tuple is pruned, thus marked dead, and then the TID is marked unused. (4) Transaction #4 performs an INSERT which reuses the same TID. (5) Transaction #1 now performs a DELETE or UPDATE using the previous TID and updates the unrelated tuple which reused the TID rather than the intended tuple. It seems to me that what is supposed to prevent this from happening is that you aren't supposed to release your snapshot at the end of step #1. You're supposed to hold onto it until after step #5 is complete. I think that there are fair number of places that are already careful about that. I just picked a random source file that I knew Tom had written and found this bit in extension_config_remove: extScan = systable_beginscan(extRel, ExtensionOidIndexId, true, NULL, 1, key); extTup = systable_getnext(extScan); ...a lot more stuff... CatalogTupleUpdate(extRel, &extTup->t_self, extTup); systable_endscan(extScan); Quite apart from this issue, there's a very good reason why it's like that: extTup might be pointing right into a disk buffer, and if we did systable_endscan() before the last access to it, our pointer could become invalid. A fair number of places are protected due to the scan being kept open like this, but it looks like most of the ones that use SearchSysCacheCopyX + CatalogTupleUpdate are problematic. I would be inclined to fix this problem by adjusting those places to keep a snapshot open rather than by making some arbitrary rule about holding onto a catalog snapshot until the end of the command. That seems like a fairly magical coding rule that will happen to work in most practical cases but isn't really a principled approach to the problem. Besides being magical, it's also fragile: just deciding to use a some other snapshot instead of the catalog snapshot causes your code to be subtly broken in a way you're surely not going to expect. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2020-04-09 16:56:03 -0400, Robert Haas wrote: > [ belatedly responding ] > > On Sat, Feb 29, 2020 at 3:17 PM Andres Freund <andres@anarazel.de> wrote: > > My preliminary conclusion is that it's simply not safe to do > > SnapshotResetXmin() from within InvalidateCatalogSnapshot(), > > PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need > > to defer the SnapshotResetXmin() call until at least > > CommitTransactionCommand()? Outside of that there ought (with exception > > of multi-transaction commands, but they have to be careful anyway) to be > > no "in progress" sequences of related catalog lookups/modifications. > > > > Alternatively we could ensure that all catalog lookup/mod sequences > > ensure that the first catalog snapshot is registered. But that seems > > like a gargantuan task? > > If I understand correctly, the scenario you're concerned about is > something like this: > > (1) Transaction #1 reads a catalog tuple and immediately releases its snapshot. > (2) Transaction #2 performs a DELETE or UPDATE on that catalog tuple. > (3) Transaction #3 completes a VACUUM on the table, so that the old > tuple is pruned, thus marked dead, and then the TID is marked unused. > (4) Transaction #4 performs an INSERT which reuses the same TID. > (5) Transaction #1 now performs a DELETE or UPDATE using the previous > TID and updates the unrelated tuple which reused the TID rather than > the intended tuple. Pretty much. I think it's enough for 3) and 4) to happen in quite that way. If 3) is just HOT pruned away, or 3) happens but 4) doesn't, we'd still be in trouble: Currently heap_update/delete has no non-assert check that the passed in TID is an existing tuple. lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid)); Assert(ItemIdIsNormal(lp)); .. oldtup.t_tableOid = RelationGetRelid(relation); oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp); oldtup.t_len = ItemIdGetLength(lp); oldtup.t_self = *otid; ... modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs, &oldtup, newtup); so we'll treat the page header as a tuple. Not likely to end well. > It seems to me that what is supposed to prevent this from happening is > that you aren't supposed to release your snapshot at the end of step > #1. You're supposed to hold onto it until after step #5 is complete. I > think that there are fair number of places that are already careful > about that. I just picked a random source file that I knew Tom had > written and found this bit in extension_config_remove: > > extScan = systable_beginscan(extRel, ExtensionOidIndexId, true, > NULL, 1, key); > > extTup = systable_getnext(extScan); > ...a lot more stuff... > CatalogTupleUpdate(extRel, &extTup->t_self, extTup); > > systable_endscan(extScan); > > Quite apart from this issue, there's a very good reason why it's like > that: extTup might be pointing right into a disk buffer, and if we did > systable_endscan() before the last access to it, our pointer could > become invalid. A fair number of places are protected due to the scan > being kept open like this, but it looks like most of the ones that use > SearchSysCacheCopyX + CatalogTupleUpdate are problematic. Indeed. There's unfortunately quite a few of those. There's also a few places, most prominently probably performMultipleDeletions(), that explicitly do searches, and then afterwards perform deletions - without holding a snapshot. > I would be inclined to fix this problem by adjusting those places to > keep a snapshot open rather than by making some arbitrary rule about > holding onto a catalog snapshot until the end of the command. That's what my prototype patch did. It's doable, although we would need more complete assertions than I had added to ensure we're not introducing more broken places. While my patch did that, for correctness I don't think it can just be something like snap = RegisterSnapshot(GetLatestSnapshot()); or PushActiveSnapshot(GetTransactionSnapshot()); as neither willbe the catalog snapshot, which could be older than GetLatestSnapshot()/GetTransactionSnapshot(). But IIRC we also can't just register the catalog snapshot, because some parts of the system will use a "normal" snapshot instead (which could be older). > That seems like a fairly magical coding rule that will happen to work > in most practical cases but isn't really a principled approach to the > problem. I'm not sure it'd be that magical to only release resources at CommitTransactionCommand() time. We kinda do that for a few other things already. > Besides being magical, it's also fragile: just deciding to > use a some other snapshot instead of the catalog snapshot causes your > code to be subtly broken in a way you're surely not going to expect. That's actually kind of an argument the other way for me: Because there can be multiple snapshots, and because it is hard to check that the same snapshot is held across lookup & update, it seems more robust to not reset the xmin in the middle of a command. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-04-09 16:56:03 -0400, Robert Haas wrote: >> That seems like a fairly magical coding rule that will happen to work >> in most practical cases but isn't really a principled approach to the >> problem. > I'm not sure it'd be that magical to only release resources at > CommitTransactionCommand() time. We kinda do that for a few other things > already. I'd be worried about consumption of resources during a long transaction. But maybe we could release at CommandCounterIncrement? Still, I tend to agree with Robert that associating a snap with an open catalog scan is the right way. I have vague memories that a long time ago, all catalog modifications were done via the fetch-from-a- scan-and-update approach. Starting from a catcache tuple instead is a relative newbie. If we're going to forbid using a catcache tuple as the starting point for updates, one way to enforce it would be to have the catcache not save the TID. regards, tom lane
Hi, On 2020-04-09 18:52:32 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2020-04-09 16:56:03 -0400, Robert Haas wrote: > >> That seems like a fairly magical coding rule that will happen to work > >> in most practical cases but isn't really a principled approach to the > >> problem. > > > I'm not sure it'd be that magical to only release resources at > > CommitTransactionCommand() time. We kinda do that for a few other things > > already. > > I'd be worried about consumption of resources during a long transaction. > But maybe we could release at CommandCounterIncrement? Which resources are you thinking of? SnapshotResetXmin() shouldn't take any directly. Obviously it can cause bloat - but where would we use a snapshot for only some part of a command, but need not have xmin preserved till the end of the command? > Still, I tend to agree with Robert that associating a snap with an > open catalog scan is the right way. I'm wondering if we should do both. I think releasing xmin in the middle of a command, but only when invalidations arrive in the middle of it, is pretty likely to be involved in bugs in the future. But it also seems good to ensure that snapshots are held across relevant operations. Any idea how to deal with different types of snapshots potentially being used within such a sequence? > I have vague memories that a long time ago, all catalog modifications > were done via the fetch-from-a- scan-and-update approach. Starting > from a catcache tuple instead is a relative newbie. If we're going to > forbid using a catcache tuple as the starting point for updates, one > way to enforce it would be to have the catcache not save the TID. I suspect that that'd be fairly painful code-churn wise. Greetings, Andres Freund