Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> Vacuum's always had a race condition: it makes a list of rel OIDs and
>> then tries to vacuum each one. It narrows the window for failure by
>> doing a SearchSysCacheExists test before relation_open, but there's
>> still a window for failure.
> hmm yeah - missed the VACUUM; part of the regression diff.
> Still this means we will have to live with (rare) failures once in a
> while during that test ?
I thought of what seems a pretty simple solution for this: make VACUUM
lock the relation before doing the SearchSysCacheExists, ie instead
of the existing code
if (!SearchSysCacheExists(RELOID, ObjectIdGetDatum(relid),
0,0, 0)) // give up
lmode = vacstmt->full ? AccessExclusiveLock : ShareUpdateExclusiveLock;
onerel = relation_open(relid, lmode);
do
lmode = vacstmt->full ? AccessExclusiveLock : ShareUpdateExclusiveLock;
LockRelationOid(relid, lmode);
if (!SearchSysCacheExists(RELOID, ObjectIdGetDatum(relid),
0,0, 0)) // give up
onerel = relation_open(relid, NoLock);
Once we're holding lock, we can be sure there's not a DROP TABLE in
progress, so there's no race condition anymore. It's OK to take a
lock on the OID of a relation that no longer exists, AFAICS; we'll
just drop it again immediately (the "give up" path includes transaction
exit, so there's not even any extra code needed).
This wasn't possible before the recent adjustments to the relation
locking protocol, but now it looks trivial ... am I missing anything?
Perhaps it is worth folding this test into a "conditional_relation_open"
function that returns NULL instead of failing if the rel no longer
exists. I think there are potential uses in CLUSTER and perhaps REINDEX
as well as VACUUM.
regards, tom lane