Re: fallocate / posix_fallocate for new WAL file creation (etc...) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: fallocate / posix_fallocate for new WAL file creation (etc...) |
Date | |
Msg-id | 20130517211826.GA19654@awork2.anarazel.de Whole thread Raw |
In response to | Re: fallocate / posix_fallocate for new WAL file creation (etc...) (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: fallocate / posix_fallocate for new WAL file creation (etc...)
|
List | pgsql-hackers |
On 2013-05-17 15:48:38 -0500, Merlin Moncure wrote: > On Fri, May 17, 2013 at 8:29 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > > On Fri, May 17, 2013 at 4:47 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> On 2013-05-15 16:46:33 -0500, Jon Nelson wrote: > >>> > * Is wal file creation performance actually relevant? Is the performance > >>> > of a system running on fallocate()d wal files any different? > >>> > >>> In my limited testing, I noticed a drop of approx. 100ms per WAL file. > >>> I do not have a good idea for how to really stress the WAL-file > >>> creation area without calling pg_start_backup and pg_stop_backup over > >>> and over (with archiving enabled). > >> > >> My point is that wal file creation usually isn't all that performance > >> sensitive. Once the cluster has enough WAL files it will usually recycle > >> them and thus never allocate new ones. So for this to be really > >> beneficial it would be interesting to show different performance during > >> normal running. You could also check out of how many extents a wal file > >> is made out of with fallocate in comparison to the old style method > >> (filefrag will give you that for most filesystems). > > > > But why does it have to be *really* beneficial? We're already making > > optional posix_fxxx calls and fallocate seems to do exactly what we > > would want in this context. Even if the 100ms drop doesn't show up > > all that often, I'd still take it just for the defragmentation > > benefits and the patch is fairly tiny. Well, it needs to be tested et al. And its a fairly critical code path. I seem to remember that there were older glibc versions that didn't do such a great job at emulating fallocate for example. > Here is sample output of filefrag on a somewhat busy database from our > testing environment that exactly duplicates our production workloads.. > It does a lot of batch processing at night and a mix of 80%oltp 20% > olap during the day. This is on ext3. Interestingly, on ext4 servers > I never saw more than 2 extents per file (but those servers are mostly > not as busy). Ok, that's pretty bad. 490 extents in one file? Really? I'd consider shutting down the cluster, copying the wal files in a moment where there is enough free space. Just don't forget to sync afterwards. EXT4 is notably better at allocating space in growing files than ext3 due to delayed allocation (and other things), so it wouldn't surprise me similar differences in fragmentation even if the load were comparable. Ext3 doesn't have fallocate btw, so it wouldn't benefit from such a patch anyway. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: