Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date
Msg-id 20180418114615.GB20040@momjian.us
Whole thread Raw
In response to Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List pgsql-hackers
On Wed, Apr 18, 2018 at 06:04:30PM +0800, Craig Ringer wrote:
> On 18 April 2018 at 05:19, Bruce Momjian <bruce@momjian.us> wrote:
> > On Tue, Apr 10, 2018 at 05:54:40PM +0100, Greg Stark wrote:
> >> On 10 April 2018 at 02:59, Craig Ringer <craig@2ndquadrant.com> wrote:
> >>
> >> > Nitpick: In most cases the kernel reserves disk space immediately,
> >> > before returning from write(). NFS seems to be the main exception
> >> > here.
> >>
> >> I'm kind of puzzled by this. Surely NFS servers store the data in the
> >> filesystem using write(2) or the in-kernel equivalent? So if the
> >> server is backed by a filesystem where write(2) preallocates space
> >> surely the NFS server must behave as if it'spreallocating as well? I
> >> would expect NFS to provide basically the same set of possible
> >> failures as the underlying filesystem (as long as you don't enable
> >> nosync of course).
> >
> > I don't think the write is _sent_ to the NFS at the time of the write,
> > so while the NFS side would reserve the space, it might get the write
> > request until after we return write success to the process.
> 
> It should be sent if you're using sync mode.
> 
> >From my reading of the docs, if you're using async mode you're already
> open to so many potential corruptions you might as well not bother.
> 
> I need to look into this more re NFS and expand the tests I have to
> cover that properly.

So, if sync mode passes the write to NFS, and NFS pre-reserves write
space, and throws an error on reservation failure, that means that NFS
will not corrupt a cluster on out-of-space errors.

So, what about thin provisioning?  I can understand sharing _free_ space
among file systems, but once a write arrives I assume it reserves the
space.  Is the problem that many thin provisioning systems don't have a
sync mode, so you can't force the write to appear on the device before
an fsync?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


pgsql-hackers by date:

Previous
From: Arthur Zakirov
Date:
Subject: Re: [HACKERS] proposal: schema variables
Next
From: Konstantin Knizhnik
Date:
Subject: Re: Built-in connection pooling