Home > mailing lists

Re: fallocate / posix_fallocate for new WAL file creation (etc...) - Mailing list pgsql-hackers

From	Merlin Moncure
Subject	Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Date	May 17, 2013 21:52:54
Msg-id	CAHyXU0zwic2=qA9GFv6S4upUGXVWvRLwZNRqJ_v6U+ydBBc_Tg@mail.gmail.com Whole thread
In response to	Re: fallocate / posix_fallocate for new WAL file creation (etc...) (Andres Freund <andres@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Fri, May 17, 2013 at 4:18 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-05-17 15:48:38 -0500, Merlin Moncure wrote:
>> On Fri, May 17, 2013 at 8:29 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> > On Fri, May 17, 2013 at 4:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> >> On 2013-05-15 16:46:33 -0500, Jon Nelson wrote:
>> >>> > * Is wal file creation performance actually relevant? Is the performance
>> >>> >   of a system running on fallocate()d wal files any different?
>> >>>
>> >>> In my limited testing, I noticed a drop of approx. 100ms per WAL file.
>> >>> I do not have a good idea for how to really stress the WAL-file
>> >>> creation area without calling pg_start_backup and pg_stop_backup over
>> >>> and over (with archiving enabled).
>> >>
>> >> My point is that wal file creation usually isn't all that performance
>> >> sensitive. Once the cluster has enough WAL files it will usually recycle
>> >> them and thus never allocate new ones. So for this to be really
>> >> beneficial it would be interesting to show different performance during
>> >> normal running. You could also check out of how many extents a wal file
>> >> is made out of with fallocate in comparison to the old style method
>> >> (filefrag will give you that for most filesystems).
>> >
>> > But why does it have to be *really* beneficial?  We're already making
>> > optional posix_fxxx calls and fallocate seems to do exactly what we
>> > would want in this context.  Even if the 100ms drop doesn't show up
>> > all that often, I'd still take it just for the defragmentation
>> > benefits and the patch is fairly tiny.
>
> Well, it needs to be tested et al. And its a fairly critical code
> path. I seem to remember that there were older glibc versions that
> didn't do such a great job at emulating fallocate for example.
>
>> Here is sample output of filefrag on a somewhat busy database from our
>> testing environment that exactly duplicates our production workloads..
>>  It does a lot of batch processing at night and a mix of 80%oltp 20%
>> olap during the day.  This is on ext3.  Interestingly, on ext4 servers
>> I never saw more than 2 extents per file (but those servers are mostly
>> not as busy).
>
> Ok, that's pretty bad. 490 extents in one file? Really? I'd consider
> shutting down the cluster, copying the wal files in a moment where there
> is enough free space. Just don't forget to sync afterwards.
> EXT4 is notably better at allocating space in growing files than ext3
> due to delayed allocation (and other things), so it wouldn't surprise me
> similar differences in fragmentation even if the load were comparable.
>
> Ext3 doesn't have fallocate btw, so it wouldn't benefit from such a
> patch anyway.

yeah -- I see your point.  The object lesson isn't so much 'improve
postgres' as it is to 'use a modern filesystem'.

merlin

pgsql-hackers by date:

From: Kevin Grittner
Date: 17 May 2013, 21:49:45
Subject: Re: counting algorithm for incremental matview maintenance

From: Nicolas Barbier
Date: 17 May 2013, 21:53:39
Subject: Re: counting algorithm for incremental matview maintenance

Re: fallocate / posix_fallocate for new WAL file creation (etc...) - Mailing list pgsql-hackers

Previous

Next