Re: fallocate / posix_fallocate for new WAL file creation (etc...) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Date
Msg-id 20130517211826.GA19654@awork2.anarazel.de
Whole thread Raw
In response to Re: fallocate / posix_fallocate for new WAL file creation (etc...)  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: fallocate / posix_fallocate for new WAL file creation (etc...)  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On 2013-05-17 15:48:38 -0500, Merlin Moncure wrote:
> On Fri, May 17, 2013 at 8:29 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> > On Fri, May 17, 2013 at 4:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> On 2013-05-15 16:46:33 -0500, Jon Nelson wrote:
> >>> > * Is wal file creation performance actually relevant? Is the performance
> >>> >   of a system running on fallocate()d wal files any different?
> >>>
> >>> In my limited testing, I noticed a drop of approx. 100ms per WAL file.
> >>> I do not have a good idea for how to really stress the WAL-file
> >>> creation area without calling pg_start_backup and pg_stop_backup over
> >>> and over (with archiving enabled).
> >>
> >> My point is that wal file creation usually isn't all that performance
> >> sensitive. Once the cluster has enough WAL files it will usually recycle
> >> them and thus never allocate new ones. So for this to be really
> >> beneficial it would be interesting to show different performance during
> >> normal running. You could also check out of how many extents a wal file
> >> is made out of with fallocate in comparison to the old style method
> >> (filefrag will give you that for most filesystems).
> >
> > But why does it have to be *really* beneficial?  We're already making
> > optional posix_fxxx calls and fallocate seems to do exactly what we
> > would want in this context.  Even if the 100ms drop doesn't show up
> > all that often, I'd still take it just for the defragmentation
> > benefits and the patch is fairly tiny.

Well, it needs to be tested et al. And its a fairly critical code
path. I seem to remember that there were older glibc versions that
didn't do such a great job at emulating fallocate for example.

> Here is sample output of filefrag on a somewhat busy database from our
> testing environment that exactly duplicates our production workloads..
>  It does a lot of batch processing at night and a mix of 80%oltp 20%
> olap during the day.  This is on ext3.  Interestingly, on ext4 servers
> I never saw more than 2 extents per file (but those servers are mostly
> not as busy).

Ok, that's pretty bad. 490 extents in one file? Really? I'd consider
shutting down the cluster, copying the wal files in a moment where there
is enough free space. Just don't forget to sync afterwards.
EXT4 is notably better at allocating space in growing files than ext3
due to delayed allocation (and other things), so it wouldn't surprise me
similar differences in fragmentation even if the load were comparable.

Ext3 doesn't have fallocate btw, so it wouldn't benefit from such a
patch anyway.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: askpass program for libpq
Next
From: Kevin Grittner
Date:
Subject: Re: counting algorithm for incremental matview maintenance