Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Consider the variant with extra marker files. In that case, backend B
> doesn't have to know about the .notcommitted status to flush the buffers.
[ shrug ] It's still broken, and the reason is that there's no
equivalent of fsync for directory operations. Consider
A creates 1234 and 1234.notcommitted
A commits
B performs a checkpoint
crash
all before A manages to delete 1234.notcommitted, or at least before
that deletion has made its way to disk. Upon restart, only WAL
events after the checkpoint will be replayed, so 1234.notcommitted
doesn't go away, and then you've got a problem.
To fix this there would need to be a way (1) for B to be aware of the
pending file deletion and (2) for B to delay committing the checkpoint
until the directory update is surely down on disk. Your proposal
doesn't provide for (1), and even if we fixed that, I know of no
portable kernel API for (2). fsync isn't applicable.
While your original patch is buggy, it's at least fixable and has
localized, limited impact. I don't think these schemes are safe
at all --- they put a great deal more weight on the semantics of
the filesystem than I care to do.
regards, tom lane