Re: fsync reliability - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: fsync reliability
Date
Msg-id 201105091822.p49IMOP21362@momjian.us
Whole thread Raw
In response to Re: fsync reliability  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
FYI, does wal.c need updated comments to explain the file system
semantics we expect, and how our code triggers it?

---------------------------------------------------------------------------

Greg Smith wrote:
> On 04/23/2011 09:58 AM, Matthew Woodcraft wrote:
> > As far as I can make out, the current situation is that this fix (the
> > auto_da_alloc mount option) doesn't work as advertised, and the ext4
> > maintainers are not treating this as a bug.
> >
> > See https://bugzilla.kernel.org/show_bug.cgi?id=15910
> >    
> 
> I agree with the resolution that this isn't a bug.  As pointed out 
> there, XFS does the same thing, and this behavior isn't going away any 
> time soon.  Leaving behind zero-length files in situations where 
> developers tried to optimize away a necessary fsync happens.
> 
> Here's the part where the submitter goes wrong:
> 
> "We first added a fsync() call for each extracted file. But scattered 
> fsyncs resulted in a massive performance degradation during package 
> installation (factor 10 or more, some reported that it took over an hour 
> to unpack a linux-headers-* package!) In order to reduce the I/O 
> performance degradation, fsync calls were deferred..."
> 
> Stop right there; the slow path was the only one that had any hope of 
> being correct.  It can actually slow things by a factor of 100X or more, 
> worst-case.  "So, we currently have the choice between filesystem 
> corruption or major performance loss":  yes, you do.  Writing files is 
> tricky and it can either be slow or safe.  If you're going to avoid even 
> trying to enforce the right thing here, you're really going to get 
> really burned.
> 
> It's unfortunate that so many people are used to the speed you get in 
> the common situation for a while now with ext3 and cheap hard drives:  
> all writes are cached unsafely, but the filesystem resists a few bad 
> behaviors.  Much of the struggle where people say "this is so much 
> slower, I won't put up with it" and try to code around it is futile, and 
> it's hard to separate out the attempts to find such optimizations from 
> the legitimate complaints.
> 
> Anyway, you're right to point out that the filesystem is not necessarily 
> going to save anyone from some of the tricky rename situations even with 
> the improvements made to delayed allocation.  They've fixed some of the 
> worst behavior of the earlier implementation, but there are still 
> potential issues in that area it seems.
> 
> -- 
> Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
> 
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers
Next
From: Robert Haas
Date:
Subject: Re: Formatting Curmudgeons WAS: MMAP Buffers