Re: 9.4 regression - Mailing list pgsql-hackers

From Thom Brown
Subject Re: 9.4 regression
Date
Msg-id CAA-aLv5pG738sWfBEa5+Hh_c23UU50Rjpzi+AbXwOXnaf3__ZQ@mail.gmail.com
Whole thread Raw
In response to Re: 9.4 regression  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 8 August 2013 04:05, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-08-07 20:23:55 +0100, Thom Brown wrote:
>> >>> 269e78 was the commit immediately after 8800d8, so it appears that
>> >>> introduced the regression.
>> >>>
>> >>> "Use posix_fallocate() for new WAL files, where available."
>> >>
>> >> This is curious. Could you either run a longer test before/after the
>> >> commit or reduce checkpoint_timeout to something like 3min?
>> >
>> > Okay, I'll rerun the test for both those commits at 1 hour each with
>> > checkpoint_timeout set at 3mins, but with all other configuration
>> > settings the same
>>
>> Results
>> (checkpoint_timeout = 3min)
>>
>> pgbench -j 80 -c 80 -T 3600
>>
>> 269e78: 606.268013
>> 8800d8: 779.583129
>
> Ok, so the performance difference is lower. But, it seems to be still to
> high to be explaineable by performance differences during
> initialization/fallocate.
> In a very quick test, I don't see the same on my laptop. I am currently
> travelling and I don't have convenient access to anything else.
>
> Could you:
> - run filefrag on a couple of wal segments of both clusters after the
>   test and post the results here?

I ran a benchmark for 10 minutes.  I ran filefrag against all files in pg_xlog and they all say "1 extent found"

> - enable log_checkpoints, post the lines spat out together with the results

LOG:  database system was shut down at 2013-08-08 16:38:28 BST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  checkpoint starting: immediate force wait
LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.041 s, sync=0.034 s, total=0.132 s; sync files=3, longest=0.017 s, average=0.011 s
LOG:  checkpoint starting: immediate force wait
LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.015 s, sync=0.000 s, total=0.071 s; sync files=0, longest=0.000 s, average=0.000 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 39137 buffers (7.5%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=143.757 s, sync=5.659 s, total=149.620 s; sync files=44, longest=3.814 s, average=0.128 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 21141 buffers (4.0%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=143.082 s, sync=13.144 s, total=156.421 s; sync files=13, longest=10.336 s, average=1.011 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 27702 buffers (5.3%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=143.739 s, sync=4.470 s, total=148.337 s; sync files=12, longest=2.545 s, average=0.372 s
LOG:  checkpoint starting: time

> - run pgbench without reinitializing the cluster/pgbench tables
>   inbetween?

Set 1 (10 mins each):
First run after initialisation: 289.410795
Re-run without re-initialisation: 519.694545

These additional lines were added to the log for the re-run:

LOG:  checkpoint complete: wrote 14793 buffers (2.8%); 0 transaction log file(s) added, 0 removed, 16 recycled; write=143.299 s, sync=2.318 s, total=145.691 s; sync files=13, longest=1.910 s, average=0.178 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 37236 buffers (7.1%); 0 transaction log file(s) added, 0 removed, 13 recycled; write=143.410 s, sync=7.363 s, total=150.981 s; sync files=16, longest=4.871 s, average=0.460 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 28271 buffers (5.4%); 0 transaction log file(s) added, 0 removed, 20 recycled; write=143.597 s, sync=11.761 s, total=155.472 s; sync files=13, longest=8.494 s, average=0.904 s
LOG:  checkpoint starting: time

Set 2 (10 mins each):
First run after initialisation: 295.925985
Re-run without re-initialisation: 503.239516

Here's the log for set 2 after the first run:

LOG:  database system was shut down at 2013-08-08 17:06:41 BST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  checkpoint starting: immediate force wait
LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.041 s, sync=0.034 s, total=0.153 s; sync files=3, longest=0.017 s, average=0.011 s
LOG:  checkpoint starting: immediate force wait
LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.014 s, sync=0.000 s, total=0.057 s; sync files=0, longest=0.000 s, average=0.000 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 39148 buffers (7.5%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=143.322 s, sync=5.519 s, total=149.256 s; sync files=44, longest=3.691 s, average=0.125 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 21447 buffers (4.1%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=143.161 s, sync=13.459 s, total=156.843 s; sync files=13, longest=10.915 s, average=1.035 s
LOG:  checkpoint starting: time

After the re-run in set 2, these lines were added:

LOG:  checkpoint complete: wrote 26609 buffers (5.1%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=143.253 s, sync=6.588 s, total=150.036 s; sync files=15, longest=2.606 s, average=0.439 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 16931 buffers (3.2%); 0 transaction log file(s) added, 0 removed, 16 recycled; write=143.171 s, sync=16.489 s, total=160.014 s; sync files=14, longest=12.660 s, average=1.177 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 35345 buffers (6.7%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=143.170 s, sync=10.596 s, total=153.977 s; sync files=16, longest=7.996 s, average=0.662 s
LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 27907 buffers (5.3%); 0 transaction log file(s) added, 0 removed, 19 recycled; write=143.184 s, sync=12.562 s, total=155.969 s; sync files=14, longest=7.844 s, average=0.897 s
LOG:  checkpoint starting: time

I rechecked WAL file fragmentation after each run, and it always says "1 extent found". 

-- 
Thom

pgsql-hackers by date:

Previous
From: Jon Nelson
Date:
Subject: Re: 9.4 regression
Next
From: "Kodamasimham Pridhvi (MT2012066)"
Date:
Subject: Re: Proposal for XML Schema Validation