Re: Initial 9.2 pgbench write results - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Initial 9.2 pgbench write results
Date
Msg-id 4F3AC324.9060007@2ndQuadrant.com
Whole thread Raw
In response to Initial 9.2 pgbench write results  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: Initial 9.2 pgbench write results  (Robert Haas <robertmhaas@gmail.com>)
Re: Initial 9.2 pgbench write results  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On 02/14/2012 01:45 PM, Greg Smith wrote:
> scale=1000, db is 94% of RAM; clients=4
> Version TPS
> 9.0  535
> 9.1  491 (-8.4% relative to 9.0)
> 9.2  338 (-31.2% relative to 9.1)

A second pass through this data noted that the maximum number of buffers 
cleaned by the background writer is <=2785 in 9.0/9.1, while it goes as 
high as 17345 times in 9.2.  The background writer is so busy now it 
hits the max_clean limit around 147 times in the slower[1] of the 9.2 
runs.  That's an average of once every 4 seconds, quite frequent.  
Whereas max_clean rarely happens in the comparable 9.0/9.1 results.  
This is starting to point my finger more toward this being an unintended 
consequence of the background writer/checkpointer split.

Thinking out loud, about solutions before the problem is even nailed 
down, I wonder if we should consider lowering bgwriter_lru_maxpages now 
in the default config?  In older versions, the page cleaning work had at 
most a 50% duty cycle; it was only running when checkpoints were not.  
If we wanted to keep the ceiling on background writer cleaning at the 
same level in the default configuration, that would require dropping 
bgwriter_lru_maxpages from 100 to 50.  That would be roughly be the same 
amount of maximum churn.  It's obviously more complicated than that, but 
I think there's a defensible position along those lines to consider.

As a historical aside, I wonder how much this behavior might have been 
to blame for my failing to get spread checkpoints to show a positive 
outcome during 9.1 development.  The way that was written also kept the 
cleaner running during checkpoints.  I didn't measure those two changes 
individually as much as I did the combination.

[1] I normally do 3 runs of every scale/client combination, and find 
that more useful than a single run lasting 3X as long.  The first out of 
each of the 3 runs I do at any scale is usually a bit faster than the 
later two, presumably due to table and/or disk fragmentation.  I've 
tried to make this less of a factor in pgbench-tools by iterating 
through all requested client counts first, before beginning a second run 
of those scale/client combination.  So if the two client counts were 4 
and 8, it would be 4/8/4/8/4/8, which works much better than 4/4/4/8/8/8 
in terms of fragmentation impacting the average result.  Whether it 
would be better or worse to eliminate this difference by rebuilding the 
whole database multiple times for each scale is complicated.  I happen 
to like seeing the results with a bit more fragmentation mixed in, see 
how they compare with the fresh database.  Since more rebuilds would 
also make these tests take much longer than they already do, that's the 
tie-breaker that's led to the current testing schedule being the 
preferred one.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Initial 9.2 pgbench write results
Next
From: Dimitri Fontaine
Date:
Subject: Re: Command Triggers