Re: Initial 9.2 pgbench write results - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Initial 9.2 pgbench write results
Date
Msg-id CA+TgmoYyPszAHXTzMn_cjqk=qY7O0Cs5_QxfbZ+ETTKhsrOpqQ@mail.gmail.com
Whole thread Raw
In response to Re: Initial 9.2 pgbench write results  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: Initial 9.2 pgbench write results  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Initial 9.2 pgbench write results  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On Tue, Feb 14, 2012 at 3:25 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 02/14/2012 01:45 PM, Greg Smith wrote:
>>
>> scale=1000, db is 94% of RAM; clients=4
>> Version TPS
>> 9.0  535
>> 9.1  491 (-8.4% relative to 9.0)
>> 9.2  338 (-31.2% relative to 9.1)
>
> A second pass through this data noted that the maximum number of buffers
> cleaned by the background writer is <=2785 in 9.0/9.1, while it goes as high
> as 17345 times in 9.2.  The background writer is so busy now it hits the
> max_clean limit around 147 times in the slower[1] of the 9.2 runs.  That's
> an average of once every 4 seconds, quite frequent.  Whereas max_clean
> rarely happens in the comparable 9.0/9.1 results.  This is starting to point
> my finger more toward this being an unintended consequence of the background
> writer/checkpointer split.

I guess the question that occurs to me is: why is it busier?

It may be that the changes we've made to reduce lock contention are
allowing foreground processes to get work done faster.  When they get
work done faster, they dirty more buffers, and therefore the
background writer gets busier.  Also, if the background writer is more
reliably cleaning pages even during checkpoints, that could have the
same effect.  Backends write fewer of their own pages, therefore they
get more real work done, which of course means dirtying more pages.
But I'm just speculating here.

> Thinking out loud, about solutions before the problem is even nailed down, I
> wonder if we should consider lowering bgwriter_lru_maxpages now in the
> default config?  In older versions, the page cleaning work had at most a 50%
> duty cycle; it was only running when checkpoints were not.

Is this really true?  I see CheckpointWriteDelay calling BgBufferSync
in 9.1.  Background writing would stop during the sync phase and
perhaps slow down a bit during checkpoint writing, but I don't think
it was stopped completely.

I'm curious what vmstat output looks like during your test.  I've
found that's a good way to know whether the system is being limited by
I/O, CPU, or locks.  It'd also be interesting to know what the %
utilization figures for the disks looked like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Potential reference miscounts and segfaults in plpython.c
Next
From: Jan Urbański
Date:
Subject: Re: Potential reference miscounts and segfaults in plpython.c