Tom Lane wrote:
> I had an idea this morning that might be useful: back off the strength
> of what we try to guarantee. Specifically, does it matter if we leak a
> file on crash, as long as it isn't occupying a lot of disk space?
> (I suppose if you had enough crashes to accumulate many thousands of
> leaked files, the directory entries would start to be a performance drag,
> but if your DB crashes that much you have other problems.) This leads
> to the idea that we don't really need to protect the open(O_CREAT) per
> se. Rather, we can emit a WAL entry *after* successful creation of a
> file, while it's still empty. This eliminates all the issues about
> logging an action that might fail. The WAL entry would need to include
> the relfilenode and the creating XID. Crash recovery would track these
> until it saw the commit or abort or prepare record for the XID, and if
> it didn't find any, would remove the file.
That idea, like all other approaches based on tracking WAL records, fail
if there's a checkpoint after the WAL record (and that's quite likely to
happen if the file is large). WAL replay wouldn't see the file creation
WAL entry, and wouldn't know to track the xid. We'd need a way to carry
the information over checkpoints.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com