Re: fallocate / posix_fallocate for new WAL file creation (etc...) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Date
Msg-id CA+TgmobutWHS9T0MxNJF-ZrNG7wBJYhR-HtiqmPtoKaqn7=HAQ@mail.gmail.com
Whole thread Raw
In response to Re: fallocate / posix_fallocate for new WAL file creation (etc...)  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: fallocate / posix_fallocate for new WAL file creation (etc...)  (Jon Nelson <jnelson+pgsql@jamponi.net>)
List pgsql-hackers
On Tue, May 28, 2013 at 10:15 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-05-28 10:03:58 -0400, Robert Haas wrote:
>> On Sat, May 25, 2013 at 2:55 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
>> >> The biggest thing missing from this submission is information about what
>> >> performance testing you did.  Ideally performance patches are submitted with
>> >> enough information for a reviewer to duplicate the same test the author did,
>> >> as well as hard before/after performance numbers from your test system.  It
>> >> often turns tricky to duplicate a performance gain, and being able to run
>> >> the same test used for initial development eliminates a lot of the problems.
>> >
>> > This has been a bit of a struggle. While it's true that WAL file
>> > creation doesn't happen with great frequency, and while it's also true
>> > that - with strace and other tests - it can be proven that
>> > fallocate(16MB) is much quicker than writing it zeroes by hand,
>> > proving that in the larger context of a running install has been
>> > challenging.
>>
>> It's nice to be able to test things in the context of a running
>> install, but sometimes a microbenchmark is just as good.  I mean, if
>> posix_fallocate() is faster, then it's just faster, right?
>
> Well, it's a bit more complex than that. Fallocate doesn't actually
> initializes the disk space in most filesystems, just marks it as
> allocated and zeroed which is one of the reasons it can be noticeably
> faster. But that can make the runtime overhead of writing to those pages
> higher.

Maybe it would be good to measure that impact.  Something like this:

1. Write 16MB of zeroes to an empty file in the same size chunks we're
currently using (8kB?).  Time that.  Rewrite the file with real data.
Time that.
2. posix_fallocate() an empty file out to 16MB.  Time that.  Rewrite
the fie with real data.  Time that.

Personally, I have trouble believing that writing 16MB of zeroes by
hand is "better" than telling the OS to do it for us.  If that's so,
the OS is just stupid, because it ought to be able to optimize the
crap out of that compared to anything we can do.  Of course, it is
more than possible that the OS is in fact stupid.  But I'd like to
hope not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: getting rid of freezing
Next
From: Robert Haas
Date:
Subject: Re: background worker and normal exit