More snapshot-management fun - Mailing list pgsql-hackers

From Tom Lane
Subject More snapshot-management fun
Date
Msg-id 1445.1478978635@sss.pgh.pa.us
Whole thread Raw
Responses Re: More snapshot-management fun
List pgsql-hackers
While pursuing the unprotected-use-of-CatalogSnapshot problem I noted
earlier, I came across a couple of related issues:

* ProcArrayInstallImportedXmin and ProcArrayInstallRestoredXmin will
forcibly overwrite MyPgXact->xmin with some random xmin or other.
While they take pains to ensure that doesn't cause the global xmin
horizon to go backwards, there's no obvious interlock ensuring that
it doesn't cause MyPgXact->xmin to go *forwards* enough to render
some locally-stored snapshot unprotected.  I think we need to add a
call that will check for that and throw an error.  (In the case of
CatalogSnapshot, it could just invalidate CatalogSnapshot, but there
isn't any way for snapmgr.c to unilaterally invalidate a registered
or stacked-active snapshot.)  I think this may be a live bug with
respect to CatalogSnapshot, even if callers are defending against
there being any other active snapshots.

* I tried addingAssert(MyPgXact->xmin != InvalidTransactionId);
to SnapshotResetXmin() just after it's determined that RegisteredSnapshots
is nonempty, reasoning that the advertised xmin had better be nonzero if
we have any registered snapshots.  It promptly blew up on me.  The reason
is that we clear MyPgXact->xmin during ProcArrayEndTransaction, long
before AtEOXact_Snapshot.  AtEOXact_Snapshot itself didn't have a problem,
but in some cases we may call InvalidateCatalogSnapshot during the cleanup
sequence, and the modified version of that that I'm testing invokes
SnapshotResetXmin which detected the inconsistency.  I find this very
scary: it means that all through transaction cleanup, we have unprotected
snapshots sitting around.  Yeah, they should probably not get used for
anything, but there's no very strong mechanism guaranteeing that.

An easy attempt at a fix for the second problem would be to not clear
MyPgXact->xmin during ProcArrayEndTransaction, but leave it to be done
when we flush our snapshots.  But I seem to recall that it's intentional
and necessary that we clear that synchronously with clearing MyPgXact->xid.

An idea I'm playing with is to add logic so that just before
ProcArrayEndTransaction, we run through all registered and stacked
snapshots and replace their "satisfies" function pointers with pointers
to a function that just throws an error, thus catching any attempt to
use the snapshots for anything during transaction cleanup.  This is sort
of an annoying use of cycles, though I suppose we could do it only in
assert-enabled builds.

Thoughts?
        regards, tom lane



pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Re: Patch to implement pg_current_logfile() function
Next
From: Tomas Vondra
Date:
Subject: Re: xlogreader.c fails with FATAL on a cluster with 4kB block size