VACUUM FULL versus unsafe order-of-operations in DDL commands - Mailing list pgsql-hackers

From Tom Lane
Subject VACUUM FULL versus unsafe order-of-operations in DDL commands
Date
Msg-id 29144.1313346116@sss.pgh.pa.us
Whole thread Raw
Responses Re: VACUUM FULL versus unsafe order-of-operations in DDL commands
List pgsql-hackers
So, as the testing rolls on, I started to see some failures in various
ALTER-FOREIGN-thingy commands.  The cause proved to be that numerous
places in foreigncmds.c do this:
tuple = SearchSysCacheCopy(...);
... alter the tuple as needed ...
rel = heap_open(target-catalog, RowExclusiveLock);
simple_heap_update(rel, &tuple->t_self, tuple);
heap_close(rel, RowExclusiveLock);

rather than the more common pattern in which the catalog is opened
first.  I confess to not having realized this myself (or if I ever did
know it, I'd forgotten), but *the above coding pattern is not safe*.
You must get your lock on the catalog *before* looking up the target
tuple, else its TID may be obsoleted by a concurrent vacuum full before
you've obtained lock on the catalog.  Both update and delete operations
are at risk in this way.

foreigncmds.c is not hard to fix, but the scary aspect of this is the
possibility that we've made the same mistake elsewhere, or might do so
again in future.  Some desultory examination of simple_heap_update and
simple_heap_delete calls didn't find any other instances, but I am not
sure I didn't miss anything.  And this seems like an easy trap to fall
into when refactoring (the current work to try to unify operations like
ALTER OWNER could easily get into this kind of problem, for instance).

I tried to think of some practical way to mechanically test for this
type of error, but came up with nothing.  Any ideas?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: our buffer replacement strategy is kind of lame
Next
From: Robert Haas
Date:
Subject: Re: our buffer replacement strategy is kind of lame