Re: 9.4 regression - Mailing list pgsql-hackers

From Jon Nelson
Subject Re: 9.4 regression
Date
Msg-id CAKuK5J3amFQijNqFNR+QDuMjuHweLT2YSeuW5VMQh+jgbwUbHg@mail.gmail.com
Whole thread Raw
In response to Re: 9.4 regression  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: 9.4 regression
List pgsql-hackers
On Wed, Aug 7, 2013 at 10:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-08-07 20:23:55 +0100, Thom Brown wrote:
>> >>> 269e78 was the commit immediately after 8800d8, so it appears that
>> >>> introduced the regression.
>> >>>
>> >>> "Use posix_fallocate() for new WAL files, where available."
>> >>
>> >> This is curious. Could you either run a longer test before/after the
>> >> commit or reduce checkpoint_timeout to something like 3min?
>> >
>> > Okay, I'll rerun the test for both those commits at 1 hour each with
>> > checkpoint_timeout set at 3mins, but with all other configuration
>> > settings the same
>>
>> Results
>> (checkpoint_timeout = 3min)
>>
>> pgbench -j 80 -c 80 -T 3600
>>
>> 269e78: 606.268013
>> 8800d8: 779.583129
>
> Ok, so the performance difference is lower. But, it seems to be still to
> high to be explaineable by performance differences during
> initialization/fallocate.
> In a very quick test, I don't see the same on my laptop. I am currently
> travelling and I don't have convenient access to anything else.
>
> Could you:
> - run filefrag on a couple of wal segments of both clusters after the
>   test and post the results here?

For me, there is no meaningful difference, but I have a relatively
fresh filesystem (ext4) with lots of free space.

> - enable log_checkpoints, post the lines spat out together with the results
> - run pgbench without reinitializing the cluster/pgbench tables
>   inbetween?

When I do this (as I note below), the performance difference (for me)
disappears.

> Basically, I'd like to know whether its the initialization that's slow
> (measurable because we're regularly creating new files because of a too
> low checkpoint_segments) or whether it's the actual writes.

What I've found so far is very confusing.
I start by using initdb to initialize a temporary cluster, copy in a
postgresql.conf (with the modifcations from Thom Brown tweaked for my
small laptop), start the cluster, create a test database, initialize
it with pgbench, and then run. I'm also only running for two minutes
at this time.

Every time I test, the non-fallocate version is faster. I modifed
xlog.c to use posix_fallocate (or not) based on an environment
variable.
Once the WAL files have been rewritten at least once, then it doesn't
seem to matter which method is used to allocate them (although
posix_fallocate seems to have a slight edge). I even went to far as to
modify the code to posix_fallocate the file *and then zero it out
anyway*, and it was almost as fast as zeroing alone, and faster than
using posix_fallocate alone.

>> Jon, here are the test results you asked for:
>>
>> $ for i in 1 2 5 10 100; do ./test_fallocate foo $i 1; done
>> method: classic. 1 open/close iterations, 1 rewrite in 0.9157s
>> method: posix_fallocate. 1 open/close iterations, 1 rewrite in 0.5314s
>> method: glibc emulation. 1 open/close iterations, 1 rewrite in 0.6018s
>> method: classic. 2 open/close iterations, 1 rewrite in 1.1417s
>> method: posix_fallocate. 2 open/close iterations, 1 rewrite in 0.6505s
>> method: glibc emulation. 2 open/close iterations, 1 rewrite in 1.8900s
>> method: classic. 5 open/close iterations, 1 rewrite in 3.6490s
>> method: posix_fallocate. 5 open/close iterations, 1 rewrite in 1.9841s
>> method: glibc emulation. 5 open/close iterations, 1 rewrite in 3.1053s
>> method: classic. 10 open/close iterations, 1 rewrite in 5.7562s
>> method: posix_fallocate. 10 open/close iterations, 1 rewrite in 3.2015s
>> method: glibc emulation. 10 open/close iterations, 1 rewrite in 7.1426s
>> method: classic. 100 open/close iterations, 1 rewrite in 64.9483s
>> method: posix_fallocate. 100 open/close iterations, 1 rewrite in 36.3748s
>> method: glibc emulation. 100 open/close iterations, 1 rewrite in 64.8386s
>
> Ok, this seems to indicate that fallocate works nicely. Jon, wasn't
> there a version of the test that rewrote the file afterwards?

Yes. If you use a different number besides '1' as the third argument
in the command line above, it will rewrite the file that many times.

-- 
Jon



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: StrategyGetBuffer optimization, take 2
Next
From: Andres Freund
Date:
Subject: Re: 9.4 regression