Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From Anthony Iliopoulos
Subject Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date
Msg-id 20180402185320.GM11627@technoir
Whole thread Raw
In response to Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Andres Freund <andres@anarazel.de>)
Responses Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List pgsql-hackers
On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-04-01 03:14:46 +0200, Anthony Iliopoulos wrote:
> > On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote:
> > > Craig Ringer <craig@2ndquadrant.com> writes:
> > > > So we should just use the big hammer here.
> > >
> > > And bitch, loudly and publicly, about how broken this kernel behavior is.
> > > If we make enough of a stink maybe it'll get fixed.
> > 
> > It is not likely to be fixed (beyond what has been done already with the
> > manpage patches and errseq_t fixes on the reporting level). The issue is,
> > the kernel needs to deal with hard IO errors at that level somehow, and
> > since those errors typically persist, re-dirtying the pages would not
> > really solve the problem (unless some filesystem remaps the request to a
> > different block, assuming the device is alive).
> 
> Throwing away the dirty pages *and* persisting the error seems a lot
> more reasonable. Then provide a fcntl (or whatever) extension that can
> clear the error status in the few cases that the application that wants
> to gracefully deal with the case.

Given precisely that the dirty pages which cannot been written-out are
practically thrown away, the semantics of fsync() (after the 4.13 fixes)
are essentially correct: the first call indicates that a writeback error
indeed occurred, while subsequent calls have no reason to indicate an error
(assuming no other errors occurred in the meantime).

The error reporting is thus consistent with the intended semantics (which
are sadly not properly documented). Repeated calls to fsync() simply do not
imply that the kernel will retry to writeback the previously-failed pages,
so the application needs to be aware of that. Persisting the error at the
fsync() level would essentially mean moving application policy into the
kernel.

Best regards,
Anthony


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Feature Request - DDL deployment with logical replication
Next
From: Alvaro Herrera
Date:
Subject: Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.