So I've gotten things fixed to the point where the regression tests seem
to not fall over when contending with concurrent "vacuum full pg_class",
and now expanded the scope of the testing to all the system catalogs.
What's failing for me now is this chunk in opr_sanity:
*** 209,219 **** NOT p1.proisagg AND NOT p2.proisagg AND (p1.proargtypes[3] < p2.proargtypes[3]) ORDER BY 1,
2;
! proargtypes | proargtypes
! -------------+-------------
! 1114 | 1184
! (1 row)
! SELECT DISTINCT p1.proargtypes[4], p2.proargtypes[4] FROM pg_proc AS p1, pg_proc AS p2 WHERE p1.oid != p2.oid AND
--- 209,215 ---- NOT p1.proisagg AND NOT p2.proisagg AND (p1.proargtypes[3] < p2.proargtypes[3]) ORDER BY 1,
2;
! ERROR: missing chunk number 0 for toast value 23902886 in pg_toast_2619 SELECT DISTINCT p1.proargtypes[4],
p2.proargtypes[4]FROM pg_proc AS p1, pg_proc AS p2 WHERE p1.oid != p2.oid AND
On investigation, this turns out to occur when the planner is trying to
fetch the value of a toasted attribute in a cached pg_statistic tuple,
and a concurrent "vacuum full pg_statistic" has just finished. The
problem of course is that vacuum full reassigned all the toast item OIDs
in pg_statistic, so the one we have our hands on is no longer correct.
In general, *any* access to a potentially toasted attribute value in a
catcache entry is at risk here. I don't think it's going to be
feasible, either from a notational or efficiency standpoint, to insist
that callers always re-lock the source catalog before fetching a
catcache entry from which we might wish to extract a potentially toasted
attribute.
I am thinking that the most reasonable solution is instead to fix VACUUM
FULL/CLUSTER so that they don't change existing toast item OIDs when
vacuuming a system catalog. They already do some pretty ugly things to
avoid changing the toast table's OID in this case, and locking down the
item OIDs too doesn't seem that much harder. (Though I've not actually
looked at the code yet...)
The main potential drawback here is that if any varlena items that had
not previously been toasted got toasted, they would require additional
OIDs to be assigned, possibly leading to a duplicate-OID failure. This
should not happen unless somebody decides to play with the attstorage
properties of a system catalog, and I don't feel too bad about a small
possibility of VAC FULL failing after that. (Note it should eventually
succeed if you keep trying, since the generated OIDs would keep
changing.)
Thoughts?
regards, tom lane