Chris Travers <chris.travers@adjust.com> writes:
> I wonder about a different solution. Would it be possible to special case
> vacuum to check for and remove (or just move to where they can be removed)
> files when vacuuming pg_class? At the point we are vacuuming pg_class, we
> ought to be able to know that a relfilenode shouldn't be used anymore,
> right?
I don't think so. It's not clear to me whether you have in mind "scan
pg_class, collect relfilenodes from all live tuples, then zap all files
not in that set" or "when removing a dead tuple, zap the relfilenode
it mentions". But neither one works. The first case has a race condition
against new pg_class entries. As for the second, the existence of a dead
tuple bearing relfilenode N isn't evidence that some other live tuple
can't have relfilenode N.
Another problem for the second solution is that in the case you're worried
about (ie, PANIC due to out-of-WAL-space during relation's creating
transaction), there's no very good reason to expect that the relation's
pg_class tuple ever made it to disk at all.
A traditional low-tech answer to this has been to keep the WAL on a
separate volume from the main data store, so that it's protected from
out-of-space conditions in the main store and temp areas. The space
needs for WAL proper are generally much more predictable than the main
store, so it's easier to keep the dedicated space from overflowing.
(Stalled replication/archiving processes can be hazardous to your
health in this scenario, though, if they prevent prompt recycling of
WAL files.)
regards, tom lane