Home > mailing lists

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From	Anthony Iliopoulos
Subject	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date	April 9, 2018 22:26:21
Msg-id	20180409192621.GC18969@technoir Whole thread
In response to	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Greg Stark <stark@mit.edu>)
Responses	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List	pgsql-hackers

Tree view

On Mon, Apr 09, 2018 at 04:29:36PM +0100, Greg Stark wrote:
> Honestly I don't think there's *any* way to use the current interface
> to implement reliable operation. Even that embedded database using a
> single process and keeping every file open all the time (which means
> file descriptor limits limit its scalability) can be having silent
> corruption whenever some other process like a backup program comes
> along and calls fsync (or even sync?).

That is indeed true (sync would induce fsync on open inodes and clear
the error), and that's a nasty bug that apparently went unnoticed for
a very long time. Hopefully the errseq_t linux 4.13 fixes deal with at
least this issue, but similar fixes need to be adopted by many other
kernels (all those that mark failed pages as clean).

I honestly do not expect that keeping around the failed pages will
be an acceptable change for most kernels, and as such the recommendation
will probably be to coordinate in userspace for the fsync().

What about having buffered IO with implied fsync() atomicity via O_SYNC?
This would probably necessitate some helper threads that mask the
latency and present an async interface to the rest of PG, but sounds
less intrusive than going for DIO.

Best regards,
Anthony

pgsql-hackers by date:

From: Peter Geoghegan
Date: 09 April 2018, 22:25:33
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Andres Freund
Date: 09 April 2018, 22:29:16
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

Previous

Next