Re: silent data loss with ext4 / all current versions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: silent data loss with ext4 / all current versions
Date
Msg-id 565E186A.1070608@2ndquadrant.com
Whole thread Raw
In response to Re: silent data loss with ext4 / all current versions  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers

On 12/01/2015 10:44 PM, Peter Eisentraut wrote:
> On 11/27/15 8:18 AM, Michael Paquier wrote:
>> On Fri, Nov 27, 2015 at 8:17 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>>> So, what's going on? The problem is that while the rename() is atomic, it's
>>>> not guaranteed to be durable without an explicit fsync on the parent
>>>> directory. And by default we only do fdatasync on the recycled segments,
>>>> which may not force fsync on the directory (and ext4 does not do that,
>>>> apparently).
>> Yeah, that seems to be the way the POSIX spec clears things.
>> "If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall
>> force all currently queued I/O operations associated with the file
>> indicated by file descriptor fildes to the synchronized I/O completion
>> state. All I/O operations shall be completed as defined for
>> synchronized I/O file integrity completion."
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
>> If I understand that right, it is guaranteed that the rename() will be
>> atomic, meaning that there will be only one file even if there is a
>> crash, but that we need to fsync() the parent directory as mentioned.
>
> I don't see anywhere in the spec that a rename needs an fsync of the
>  directory to be durable. I can see why that would be needed in
> practice, though. File system developers would probably be able to
> give a more definite answer.

Yeah, POSIX is the smallest common denominator. In this case the spec 
seems not to require this durability guarantee (rename without fsync on 
directory), which allows a POSIX-compliant filesystem.

At least that's my conclusion from reading https://lwn.net/Articles/322823/

However, as I explained in the original post, it's more complicated as 
this only seems to be problem with fdatasync. I've been unable to 
reproduce the issue with wal_sync_method=fsync.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: Multixact slru doesn't don't force WAL flushes in SlruPhysicalWritePage()
Next
From: Tomas Vondra
Date:
Subject: Re: silent data loss with ext4 / all current versions