Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers
From | Anthony Iliopoulos |
---|---|
Subject | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS |
Date | |
Msg-id | 20180403133647.GQ11627@technoir Whole thread Raw |
In response to | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Greg Stark <stark@mit.edu>) |
Responses |
Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
|
List | pgsql-hackers |
On Tue, Apr 03, 2018 at 12:26:05PM +0100, Greg Stark wrote: > On 3 April 2018 at 11:35, Anthony Iliopoulos <ailiop@altatus.com> wrote: > > Hi Robert, > > > > Fully agree, and the errseq_t fixes have dealt exactly with the issue > > of making sure that the error is reported to all file descriptors that > > *happen to be open at the time of error*. But I think one would have a > > hard time defending a modification to the kernel where this is further > > extended to cover cases where: > > > > process A does write() on some file offset which fails writeback, > > fsync() gets EIO and exit()s. > > > > process B does write() on some other offset which succeeds writeback, > > but fsync() gets EIO due to (uncleared) failures of earlier process. > > > Surely that's exactly what process B would want? If it calls fsync and > gets a success and later finds out that the file is corrupt and didn't > match what was in memory it's not going to be happy. You can't possibly make this assumption. Process B may be reading and writing to completely disjoint regions from those of process A, and as such not really caring about earlier failures, only wanting to ensure its own writes go all the way through. But even if it did care, the file interfaces make no transactional guarantees. Even without fsync() there is nothing preventing process B from reading dirty pages from process A, and based on their content proceed to to its own business and write/persist new data, while process A further modifies the not-yet-flushed pages in-memory before flushing. In this case you'd need explicit synchronization/locking between the processes anyway, so why would fsync() be an exception? > This seems like an attempt to co-opt fsync for a new and different > purpose for which it's poorly designed. It's not an async error > reporting mechanism for writes. It would be useless as that as any > process could come along and open your file and eat the errors for > writes you performed. An async error reporting mechanism would have to > document which writes it was giving errors for and give you ways to > control that. The errseq_t fixes deal with that; errors will be reported to any process that has an open fd, irrespective to who is the actual caller of the fsync() that may have induced errors. This is anyway required as the kernel may evict dirty pages on its own by doing writeback and as such there needs to be a way to report errors on all open fds. > The semantics described here are useless for everyone. For a program > needing to know the error status of the writes it executed, it doesn't > know which writes are included in which fsync call. For a program If EIO persists between invocations until explicitly cleared, a process cannot possibly make any decision as to if it should clear the error and proceed or some other process will need to leverage that without coordination, or which writes actually failed for that matter. We would be back to the case of requiring explicit synchronization between processes that care about this, in which case the processes may as well synchronize over calling fsync() in the first place. Having an opt-in persisting EIO per-fd would practically be a form of "contract" between "cooperating" processes anyway. But instead of deconstructing and debating the semantics of the current mechanism, why not come up with the ideal desired form of error reporting/tracking granularity etc., and see how this may be fitted into kernels as a new interface. Best regards, Anthony
pgsql-hackers by date: