Home > mailing lists

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From	Anthony Iliopoulos
Subject	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date	April 2, 2018 21:53:20
Msg-id	20180402185320.GM11627@technoir Whole thread
In response to	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Andres Freund <andres@anarazel.de>)
Responses	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List	pgsql-hackers

Tree view

On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-04-01 03:14:46 +0200, Anthony Iliopoulos wrote:
> > On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote:
> > > Craig Ringer <craig@2ndquadrant.com> writes:
> > > > So we should just use the big hammer here.
> > >
> > > And bitch, loudly and publicly, about how broken this kernel behavior is.
> > > If we make enough of a stink maybe it'll get fixed.
> > 
> > It is not likely to be fixed (beyond what has been done already with the
> > manpage patches and errseq_t fixes on the reporting level). The issue is,
> > the kernel needs to deal with hard IO errors at that level somehow, and
> > since those errors typically persist, re-dirtying the pages would not
> > really solve the problem (unless some filesystem remaps the request to a
> > different block, assuming the device is alive).
> 
> Throwing away the dirty pages *and* persisting the error seems a lot
> more reasonable. Then provide a fcntl (or whatever) extension that can
> clear the error status in the few cases that the application that wants
> to gracefully deal with the case.

Given precisely that the dirty pages which cannot been written-out are
practically thrown away, the semantics of fsync() (after the 4.13 fixes)
are essentially correct: the first call indicates that a writeback error
indeed occurred, while subsequent calls have no reason to indicate an error
(assuming no other errors occurred in the meantime).

The error reporting is thus consistent with the intended semantics (which
are sadly not properly documented). Repeated calls to fsync() simply do not
imply that the kernel will retry to writeback the previously-failed pages,
so the application needs to be aware of that. Persisting the error at the
fsync() level would essentially mean moving application policy into the
kernel.

Best regards,
Anthony

pgsql-hackers by date:

From: Andres Freund
Date: 02 April 2018, 21:53:09
Subject: Re: Feature Request - DDL deployment with logical replication

From: Alvaro Herrera
Date: 02 April 2018, 22:11:12
Subject: Re: Commit 4dba331cb3 broke ATTACH PARTITION behaviour.

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

Previous

Next