Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From Andres Freund
Subject Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date
Msg-id 20180409195934.o4cnrnt3hhw4o2xi@alap3.anarazel.de
Whole thread Raw
In response to Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On 2018-04-09 14:41:19 -0500, Justin Pryzby wrote:
> On Mon, Apr 09, 2018 at 09:31:56AM +0800, Craig Ringer wrote:
> > You could make the argument that it's OK to forget if the entire file
> > system goes away. But actually, why is that ok?
> 
> I was going to say that it'd be okay to clear error flag on umount, since any
> opened files would prevent unmounting; but, then I realized we need to consider
> the case of close()ing all FDs then opening them later..in another process.

> On Mon, Apr 09, 2018 at 02:54:16PM +0200, Anthony Iliopoulos wrote:
> > notification descriptor open, where the kernel would inject events
> > related to writeback failures of files under watch (potentially
> > enriched to contain info regarding the exact failed pages and
> > the file offset they map to).
> 
> For postgres that'd require backend processes to open() an file such that,
> following its close(), any writeback errors are "signalled" to the checkpointer
> process...

I don't think that's as hard as some people argued in this thread.  We
could very well open a pipe in postmaster with the write end open in
each subprocess, and the read end open only in checkpointer (and
postmaster, but unused there).  Whenever closing a file descriptor that
was dirtied in the current process, send it over the pipe to the
checkpointer. The checkpointer then can receive all those file
descriptors (making sure it's not above the limit, fsync(), close() ing
to make room if necessary).  The biggest complication would presumably
be to deduplicate the received filedescriptors for the same file,
without loosing track of any errors.

Even better, we could do so via a dedicated worker. That'd quite
possibly end up as a performance benefit.


> I was going to say that's fine for postgres, since it chdir()s into its
> basedir, but actually not fine for nondefault tablespaces..

I think it'd be fair to open PG_VERSION of all created
tablespaces. Would require some hangups to signal checkpointer (or
whichever process) to do so when creating one, but it shouldn't be too
hard.  Some people would complain because they can't do some nasty hacks
anymore, but it'd also save peoples butts by preventing them from
accidentally unmounting.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Next
From: Andres Freund
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS