I notice that RelationCreateStorage() creates the main fork on disk
before writing (let alone flushing) WAL. So if PG gets killed at that
point, we end up with an orphaned file on disk. I think that we could
even extend the relation a few times before WAL gets written, so I
don't even think it's necessarily a zero-size file. We could perhaps
avoid this by writing and flushing a WAL record that includes the
creating XID before touching the disk; when we replay the record, we
create the file but then delete it if the XID fails to commit before
recovery ends. But I guess maybe our feeling is that it's just not
worth taking a performance hit for this?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company