Home > mailing lists

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date	April 2, 2018 21:13:46
Msg-id	20180402181345.h6j4z5agkjccr2vh@alap3.anarazel.de Whole thread Raw
In response to	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Anthony Iliopoulos <ailiop@altatus.com>)
Responses	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List	pgsql-hackers

Tree view

Hi,

On 2018-04-01 03:14:46 +0200, Anthony Iliopoulos wrote:
> On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote:
> > Craig Ringer <craig@2ndquadrant.com> writes:
> > > So we should just use the big hammer here.
> >
> > And bitch, loudly and publicly, about how broken this kernel behavior is.
> > If we make enough of a stink maybe it'll get fixed.
> 
> It is not likely to be fixed (beyond what has been done already with the
> manpage patches and errseq_t fixes on the reporting level). The issue is,
> the kernel needs to deal with hard IO errors at that level somehow, and
> since those errors typically persist, re-dirtying the pages would not
> really solve the problem (unless some filesystem remaps the request to a
> different block, assuming the device is alive).

Throwing away the dirty pages *and* persisting the error seems a lot
more reasonable. Then provide a fcntl (or whatever) extension that can
clear the error status in the few cases that the application that wants
to gracefully deal with the case.

> Keeping around dirty
> pages that cannot possibly be written out is essentially a memory leak,
> as those pages would stay around even after the application has exited.

Why do dirty pages need to be kept around in the case of persistent
errors? I don't think the lack of automatic recovery in that case is
what anybody is complaining about. It's that the error goes away and
there's no reasonable way to separate out such an error from some
potential transient errors.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Andres Freund
Date: 02 April 2018, 21:04:51
Subject: Re: [HACKERS] logical decoding of two-phase transactions

From: Andres Freund
Date: 02 April 2018, 21:28:12
Subject: Re: disable SSL compression?

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

Previous

Next