Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From Andres Freund
Subject Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date
Msg-id 20180409200420.2shb4xygozkl3zr2@alap3.anarazel.de
Whole thread Raw
In response to Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2018-04-09 21:54:05 +0200, Tomas Vondra wrote:
> Isn't the expectation that when a fsync call fails, the next one will
> retry writing the pages in the hope that it succeeds?

Some people expect that, I personally don't think it's a useful
expectation.

We should just deal with this by crash-recovery.  The big problem I see
is that you always need to keep an file descriptor open for pretty much
any file written to inside and outside of postgres, to be guaranteed to
see errors. And that'd solve that.  Even if retrying would work, I'd
advocate for that (I've done so in the past, and I've written code in pg
that panics on fsync failure...).

What we'd need to do however is to clear that bit during crash
recovery... Which is interesting from a policy perspective. Could be
that other apps wouldn't want that.

I also wonder if we couldn't just somewhere read each relevant mounted
filesystem's errseq value. Whenever checkpointer notices before
finishing a checkpoint that it has changed, do a crash restart.


Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Next
From: Peter Eisentraut
Date:
Subject: Re: Shared PostgreSQL libraries and symbol versioning