Re: fallocate / posix_fallocate for new WAL file creation (etc...) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Date
Msg-id 20130528152105.GB16637@awork2.anarazel.de
Whole thread Raw
In response to Re: fallocate / posix_fallocate for new WAL file creation (etc...)  (Jon Nelson <jnelson+pgsql@jamponi.net>)
List pgsql-hackers
On 2013-05-28 10:12:05 -0500, Jon Nelson wrote:
> On Tue, May 28, 2013 at 9:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Tue, May 28, 2013 at 10:15 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> On 2013-05-28 10:03:58 -0400, Robert Haas wrote:
> >>> On Sat, May 25, 2013 at 2:55 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
> >>> >> The biggest thing missing from this submission is information about what
> >>> >> performance testing you did.  Ideally performance patches are submitted with
> >>> >> enough information for a reviewer to duplicate the same test the author did,
> >>> >> as well as hard before/after performance numbers from your test system.  It
> >>> >> often turns tricky to duplicate a performance gain, and being able to run
> >>> >> the same test used for initial development eliminates a lot of the problems.
> >>> >
> >>> > This has been a bit of a struggle. While it's true that WAL file
> >>> > creation doesn't happen with great frequency, and while it's also true
> >>> > that - with strace and other tests - it can be proven that
> >>> > fallocate(16MB) is much quicker than writing it zeroes by hand,
> >>> > proving that in the larger context of a running install has been
> >>> > challenging.
> >>>
> >>> It's nice to be able to test things in the context of a running
> >>> install, but sometimes a microbenchmark is just as good.  I mean, if
> >>> posix_fallocate() is faster, then it's just faster, right?
> >>
> >> Well, it's a bit more complex than that. Fallocate doesn't actually
> >> initializes the disk space in most filesystems, just marks it as
> >> allocated and zeroed which is one of the reasons it can be noticeably
> >> faster. But that can make the runtime overhead of writing to those pages
> >> higher.
> >
> > Maybe it would be good to measure that impact.  Something like this:
> >
> > 1. Write 16MB of zeroes to an empty file in the same size chunks we're
> > currently using (8kB?).  Time that.  Rewrite the file with real data.
> > Time that.
> > 2. posix_fallocate() an empty file out to 16MB.  Time that.  Rewrite
> > the fie with real data.  Time that.
> >
> > Personally, I have trouble believing that writing 16MB of zeroes by
> > hand is "better" than telling the OS to do it for us.  If that's so,
> > the OS is just stupid, because it ought to be able to optimize the
> > crap out of that compared to anything we can do.  Of course, it is
> > more than possible that the OS is in fact stupid.  But I'd like to
> > hope not.
>
> I wrote a little C program to do something very similar to that (which
> I'll hope to post later today).
> It opens a new file, fallocates 16MB, calls fdatasync.  Then it loops
> 10 times:  seek to the start of the file, writes 16MB of ones, calls
> fdatasync.

You need to call fsync() not fdatasync() the first time round. fdatasync
doesn't guarantee metadata is synced.

> Then it closes and removes the file, re-opens it, and this time writes
> out 16MB of zeroes, calls fdatasync, and then does the same loop as
> above. The program times the process from file open to file unlink,
> inclusive.
>
> The results - for me - are pretty consistent: using fallocate is
> 12-13% quicker than writing out zeroes.

Cool!

> I used fdatasync twice to (attempt) to mimic what the WAL writer does.

Not sure what you mean by that though?

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: PostgreSQL Process memory architecture
Next
From: Hannu Krosing
Date:
Subject: Re: Planning incompatibilities for Postgres 10.0