Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified - Mailing list pgsql-hackers

From Tom Lane
Subject Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified
Date
Msg-id 27384.1319827063@sss.pgh.pa.us
Whole thread Raw
In response to Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified
Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> Wait a minute: I'm confused.  What's at issue here, at least AIUI, is
> taking a TOAST pointer and fetching the corresponding value.  But that
> must involve reading from the TOAST table, and our usual paradigm for
> reading from system catalogs is (1) take AccessShareLock, (2) read the
> relevant rows, (3) release AccessShareLock.  If we're doing that here,
> then AcceptInvalidationMessages() is already getting called.  If we're
> not, this seems horribly unsafe; aside from the current bug, somebody
> could rewrite the table in the interim.

Yes, the TOAST fetch will call AcceptInvalidationMessages().  But
(1) we are starting from a TOAST pointer that is within a now-stale
syscache entry, and even if we get an sinval message telling us it's
stale when we open the toast table, we're still going to try to fetch
that toast OID.  Worse, (2) there is no guarantee that the inval message
has even been sent yet, because there is no lock keeping us from trying
to use the syscache entry before the inval has been sent.

After further reflection I believe that pg_statistic entries invalidated
by ANALYZE are not the only problem.  Consider a pg_proc row whose
prosrc field is toasted (not too unlikely), and suppose that somebody
executes CREATE OR REPLACE FUNCTION to update it to a different toasted
value.  Meanwhile, somebody else is about to use the function, and they
have a copy of its pg_proc row in syscache.  There is no lock that will
block that second backend from fetching the toast tuples before it's
seen the sinval message from the CREATE OR REPLACE FUNCTION.  So the
exact same type of race can occur, if the first backend gets delayed
between committing and broadcasting its sinval messages, and an
overeager vacuum comes along to clean up the dead toast tuples.

I am currently thinking that we will have to go with Heikki's solution
of detoasting any out-of-line fields before we put a tuple into
syscache.  That does fix the problem, so long as we hold a transaction
snapshot at the instant we fetch any catalog tuple that needs this
treatment, because at the time we fetch the tuple we make a SnapshotNow
test that shows the tuple is good (and its dependent toast tuples are
too).  Even if somebody has outdated the tuple and commits immediately
afterward, vacuum cannot remove the toast tuples before we fetch them,
because the outdater is beyond our advertised MyProc->xmin.  The risk
factor comes in when we hold a syscache entry across transactions; then
this guarantee is lost.  (In both the original example and the pg_proc
case above, the second backend is only at risk when it starts its
transaction after the first one's commit, and in between VACUUM was able
to compute an OldestXmin greater than the first one's XID.)

I guess an alternative approach would be to discard syscache entries
at transaction end if HeapTupleHasExternal is true, or equivalently
mark them as to which transaction read them and not use them in later
transactions.  But that seems more fragile, because it would have to
be tied into SnapshotResetXmin --- any time we discard our last snapshot
intra-transaction, toasted syscache entries would have to be invalidated
since our advertised xmin is no longer protecting them.

The main concern I had about detoast before caching is the risk of
circularity, ie, needing detoastable cache entries in order to figure
out how to detoast.  But I think it's probably okay.  The current list
of catalogs with toast tables is

pg_attrdefpg_constraintpg_databasepg_db_role_settingpg_descriptionpg_procpg_rewritepg_seclabelpg_shdescriptionpg_statisticpg_trigger

Of these, only pg_proc is even conceivably consulted during a toast
table fetch, and we can be sure that functions needed for such a fetch
don't have toasted entries.  But we will have to be very wary of any
future proposal for adding a toast table to pg_class, pg_index, etc.

Detoast-before-caching also seems sufficiently safe and localized to
be a back-patchable fix, whereas I'd be very scared of making any
significant overhaul of the sinval mechanism in the back branches.

Barring objections, I'll see about making this happen.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: ecpg-related build failure with make 3.82
Next
From: Robert Haas
Date:
Subject: Re: Include commit identifier in version() function