Re: Catalog invalidations vs catalog scans vs ScanPgRelation() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Catalog invalidations vs catalog scans vs ScanPgRelation()
Date
Msg-id 20200409223249.pmezmufk24z7t5wu@alap3.anarazel.de
Whole thread Raw
In response to Re: Catalog invalidations vs catalog scans vs ScanPgRelation()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Catalog invalidations vs catalog scans vs ScanPgRelation()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2020-04-09 16:56:03 -0400, Robert Haas wrote:
> [ belatedly responding ]
> 
> On Sat, Feb 29, 2020 at 3:17 PM Andres Freund <andres@anarazel.de> wrote:
> > My preliminary conclusion is that it's simply not safe to do
> > SnapshotResetXmin() from within InvalidateCatalogSnapshot(),
> > PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need
> > to defer the SnapshotResetXmin() call until at least
> > CommitTransactionCommand()? Outside of that there ought (with exception
> > of multi-transaction commands, but they have to be careful anyway) to be
> > no "in progress" sequences of related catalog lookups/modifications.
> >
> > Alternatively we could ensure that all catalog lookup/mod sequences
> > ensure that the first catalog snapshot is registered. But that seems
> > like a gargantuan task?
> 
> If I understand correctly, the scenario you're concerned about is
> something like this:
> 
> (1) Transaction #1 reads a catalog tuple and immediately releases its snapshot.
> (2) Transaction #2 performs a DELETE or UPDATE on that catalog tuple.
> (3) Transaction #3 completes a VACUUM on the table, so that the old
> tuple is pruned, thus marked dead, and then the TID is marked unused.
> (4) Transaction #4 performs an INSERT which reuses the same TID.
> (5) Transaction #1 now performs a DELETE or UPDATE using the previous
> TID and updates the unrelated tuple which reused the TID rather than
> the intended tuple.

Pretty much.

I think it's enough for 3) and 4) to happen in quite that way. If 3) is
just HOT pruned away, or 3) happens but 4) doesn't, we'd still be in
trouble:
Currently heap_update/delete has no non-assert check that the
passed in TID is an existing tuple.

    lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
    Assert(ItemIdIsNormal(lp));
..
    oldtup.t_tableOid = RelationGetRelid(relation);
    oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
    oldtup.t_len = ItemIdGetLength(lp);
    oldtup.t_self = *otid;
...
    modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
                                                  &oldtup, newtup);

so we'll treat the page header as a tuple. Not likely to end well.



> It seems to me that what is supposed to prevent this from happening is
> that you aren't supposed to release your snapshot at the end of step
> #1. You're supposed to hold onto it until after step #5 is complete. I
> think that there are fair number of places that are already careful
> about that. I just picked a random source file that I knew Tom had
> written and found this bit in extension_config_remove:
> 
>         extScan = systable_beginscan(extRel, ExtensionOidIndexId, true,
>                                                                  NULL, 1, key);
> 
>         extTup = systable_getnext(extScan);
> ...a lot more stuff...
>         CatalogTupleUpdate(extRel, &extTup->t_self, extTup);
> 
>         systable_endscan(extScan);
> 
> Quite apart from this issue, there's a very good reason why it's like
> that: extTup might be pointing right into a disk buffer, and if we did
> systable_endscan() before the last access to it, our pointer could
> become invalid. A fair number of places are protected due to the scan
> being kept open like this, but it looks like most of the ones that use
> SearchSysCacheCopyX + CatalogTupleUpdate are problematic.

Indeed. There's unfortunately quite a few of those.  There's also a few
places, most prominently probably performMultipleDeletions(), that
explicitly do searches, and then afterwards perform deletions - without
holding a snapshot.


> I would be inclined to fix this problem by adjusting those places to
> keep a snapshot open rather than by making some arbitrary rule about
> holding onto a catalog snapshot until the end of the command.

That's what my prototype patch did. It's doable, although we would need
more complete assertions than I had added to ensure we're not
introducing more broken places.

While my patch did that, for correctness I don't think it can just be
something like
    snap = RegisterSnapshot(GetLatestSnapshot());
or
    PushActiveSnapshot(GetTransactionSnapshot());

as neither willbe the catalog snapshot, which could be older than
GetLatestSnapshot()/GetTransactionSnapshot(). But IIRC we also can't
just register the catalog snapshot, because some parts of the system
will use a "normal" snapshot instead (which could be older).


> That seems like a fairly magical coding rule that will happen to work
> in most practical cases but isn't really a principled approach to the
> problem.

I'm not sure it'd be that magical to only release resources at
CommitTransactionCommand() time. We kinda do that for a few other things
already.


> Besides being magical, it's also fragile: just deciding to
> use a some other snapshot instead of the catalog snapshot causes your
> code to be subtly broken in a way you're surely not going to expect.

That's actually kind of an argument the other way for me: Because there
can be multiple snapshots, and because it is hard to check that the same
snapshot is held across lookup & update, it seems more robust to not
reset the xmin in the middle of a command.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [Patch] Use internal pthreads reimplementation only whenbuilding with MSVC
Next
From: Bruce Momjian
Date:
Subject: Re: where should I stick that backup?