On Mon, Apr 09, 2018 at 04:29:36PM +0100, Greg Stark wrote:
> Honestly I don't think there's *any* way to use the current interface
> to implement reliable operation. Even that embedded database using a
> single process and keeping every file open all the time (which means
> file descriptor limits limit its scalability) can be having silent
> corruption whenever some other process like a backup program comes
> along and calls fsync (or even sync?).
That is indeed true (sync would induce fsync on open inodes and clear
the error), and that's a nasty bug that apparently went unnoticed for
a very long time. Hopefully the errseq_t linux 4.13 fixes deal with at
least this issue, but similar fixes need to be adopted by many other
kernels (all those that mark failed pages as clean).
I honestly do not expect that keeping around the failed pages will
be an acceptable change for most kernels, and as such the recommendation
will probably be to coordinate in userspace for the fsync().
What about having buffered IO with implied fsync() atomicity via O_SYNC?
This would probably necessitate some helper threads that mask the
latency and present an async interface to the rest of PG, but sounds
less intrusive than going for DIO.
Best regards,
Anthony