Thread: tuning bgwriter in 8.4.2

tuning bgwriter in 8.4.2

From

Ben Chobot

Date:

14 February 2010, 17:25:52

We recently upgraded to 8.4.2 and I'm trying to make sure our bgwriter is working as well as it can. Based on:

# select * from pg_stat_bgwriter ;

-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------

804 | 2 | 39171885 | 22562 | 211 | 24759656 | 4488627

(1 row)

...I'm not sure that it is, because as I understand things, assuming I don't keep updating the same pages then buffers_backend should be a small percentage of buffers_alloc, and buffers_clean should be larger than it is compared to buffers_checkpoint. Is my understanding correct?

My checkpoints are spread exactly 5 minutes apart, and a typical checkpoint log entry looks like:

checkpoint complete: wrote 48289 buffers (3.7%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=149.872 s, sync=0.378 s, total=150.256 s

The only bgwriter tunable we've adjusted so far is bgwriter_lru_maxpages = 500, though we've also set checkpoint_segments = 768 (not that we need it that high, but we have the space on the wal volume.)

Re: tuning bgwriter in 8.4.2

From

Ben Chobot

Date:

14 February 2010, 17:48:57

On Feb 14, 2010, at 10:25 AM, Ben Chobot wrote:

We recently upgraded to 8.4.2 and I'm trying to make sure our bgwriter is working as well as it can. Based on:

# select * from pg_stat_bgwriter ;
checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
804 | 2 | 39171885 | 22562 | 211 | 24759656 | 4488627
(1 row)

...I'm not sure that it is, because as I understand things, assuming I don't keep updating the same pages then buffers_backend should be a small percentage of buffers_alloc, and buffers_clean should be larger than it is compared to buffers_checkpoint. Is my understanding correct?

My checkpoints are spread exactly 5 minutes apart, and a typical checkpoint log entry looks like:

checkpoint complete: wrote 48289 buffers (3.7%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=149.872 s, sync=0.378 s, total=150.256 s

The only bgwriter tunable we've adjusted so far is bgwriter_lru_maxpages = 500, though we've also set checkpoint_segments = 768 (not that we need it that high, but we have the space on the wal volume.)

I should have added that those 211 maxwritten_clean entries came about before we set bgwriter_lru_maxpages to 500. And the 2 requested checkpoints came with the initial slony load.

Re: tuning bgwriter in 8.4.2

From

Greg Smith

Date:

17 February 2010, 22:42:22

Ben Chobot wrote:
> As I understand things, assuming I don't keep updating the same pages
> then buffers_backend should be a small percentage of buffers_alloc,
> and buffers_clean should be larger than it is compared to
> buffers_checkpoint. Is my understanding correct?

Sure; your buffers_clean is really low relative to the totals.  You
should take a snapshot now that you've fixed bgwriter_lru_maxpages, with
a timestamp, and then another sometime later to get really useful
numbers.  A diff only considering the current setup and with a time
interval to go along with it is much more useful than the aggregate
numbers here (patch to make that easier already in 9.0:
http://blog.2ndquadrant.com/en/2010/01/measuring-postgresql-checkpoin.html
)  Keep taking regular snapshots with a timestamp:

select current_timestamp,* from pg_stat_bgwriter;

So you can compute a diff to measure what's changing as you go.

The only explanation I can offer is that your workload might be really
bursty.  The method used for estimating how much the cleaner should do
is most likely to break down when the load comes in narrow spikes.  The
main way to improve response in that situation is by decreasing the
interval, so it kicks in and does the "what's happened during the last
<n> ms?" computations more often.  Right now, a burst that lasts less
than 200ms can be completely missed, if the system was mostly idle
before that.

You can try lowering bgwriter_delay and proportionally decreasing
bgwriter_lru_maxpages to make response time to burst workloads better.
In your situation, I'd try make the writer wake up 4X as often, only do
1/4 as much maximum work as it currently does each time, and doubling
the multiplier too; see if things move in the right direction, and maybe
keep going from there afterwards.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: tuning bgwriter in 8.4.2

From

Ben Chobot

Date:

18 February 2010, 00:07:22

On Feb 17, 2010, at 3:41 PM, Greg Smith wrote:

> Ben Chobot wrote:
>> As I understand things, assuming I don't keep updating the same pages then buffers_backend should be a small
percentageof buffers_alloc, and buffers_clean should be larger than it is compared to buffers_checkpoint. Is my
understandingcorrect? 
>
> Sure; your buffers_clean is really low relative to the totals.  You should take a snapshot now that you've fixed
bgwriter_lru_maxpages,with a timestamp, and then another sometime later to get really useful numbers.  A diff only
consideringthe current setup and with a time interval to go along with it is much more useful than the aggregate
numbershere (patch to make that easier already in 9.0:
http://blog.2ndquadrant.com/en/2010/01/measuring-postgresql-checkpoin.html)  Keep taking regular snapshots with a
timestamp:
>
> select current_timestamp,* from pg_stat_bgwriter;
>
> So you can compute a diff to measure what's changing as you go.
>
> The only explanation I can offer is that your workload might be really bursty.  The method used for estimating how
muchthe cleaner should do is most likely to break down when the load comes in narrow spikes.  The main way to improve
responsein that situation is by decreasing the interval, so it kicks in and does the "what's happened during the last
<n>ms?" computations more often.  Right now, a burst that lasts less than 200ms can be completely missed, if the system
wasmostly idle before that. 
>
> You can try lowering bgwriter_delay and proportionally decreasing bgwriter_lru_maxpages to make response time to
burstworkloads better.  In your situation, I'd try make the writer wake up 4X as often, only do 1/4 as much maximum
workas it currently does each time, and doubling the multiplier too; see if things move in the right direction, and
maybekeep going from there afterwards. 

Thanks for the suggestions Greg. I'll monitor it closely over the next few days, but it doesn't really seem to have
changedmuch so far. Is there a way to tell if I really am just keeping the same few pages dirty throughout every
checkpoint?I wouldn't have expected that, but given our application I suppose it is possible.

Re: tuning bgwriter in 8.4.2

From

Greg Smith

Date:

18 February 2010, 00:38:37

Ben Chobot wrote:
> Is there a way to tell if I really am just keeping the same few pages dirty throughout every checkpoint? I wouldn't
haveexpected that, but given our application I suppose it is possible. 

You can install pg_buffercache and look at what's in the cache to check
your theory.  I have some sample queries that show neat things at
http://www.westnet.com/~gsmith/content/postgresql/bufcache.sh

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: tuning bgwriter in 8.4.2

From

Ben Chobot

Date:

22 February 2010, 22:22:05

On Feb 17, 2010, at 6:38 PM, Greg Smith wrote:

Ben Chobot wrote:
Is there a way to tell if I really am just keeping the same few pages dirty throughout every checkpoint? I wouldn't have expected that, but given our application I suppose it is possible.

You can install pg_buffercache and look at what's in the cache to check your theory. I have some sample queries that show neat things at http://www.westnet.com/~gsmith/content/postgresql/bufcache.sh

This appears to be fairly typical:

# select count(*),isdirty,usagecount from pg_buffercache group by isdirty,usagecount order by usagecount desc,isdirty;

count | isdirty | usagecount

--------+---------+------------

670629 | f | 5

75766 | t | 5

237311 | f | 4

5372 | t | 4

74682 | f | 3

31 | t | 3

73786 | f | 2

18 | t | 2

104112 | f | 1

62 | t | 1

68951 | f | 0

(11 rows)

Is it reading it correctly to say that the bgwriter probably wouldn't help much, because a majority of the dirty pages appear to be popular?

Re: tuning bgwriter in 8.4.2

From

Greg Smith

Date:

23 February 2010, 00:47:52

Ben Chobot wrote:
> Is it reading it correctly to say that the bgwriter probably wouldn't
> help much, because a majority of the dirty pages appear to be popular?

Yes.  The background writer cleaner process only does something useful
if there are pages with low usage counts it can evict.  You would need
to increase shared_buffers significantly before it's likely that would
happen.  Right now, 87% of your buffer cache has a usage count of 2 or
higher, which basically means it's filled with almost nothing but the
working set of data it never wants to evict unless it's for a checkpoint.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: tuning bgwriter in 8.4.2

From

Ben Chobot

Date:

23 February 2010, 15:24:11

On Feb 22, 2010, at 6:47 PM, Greg Smith wrote:

> Ben Chobot wrote:
>> Is it reading it correctly to say that the bgwriter probably wouldn't help much, because a majority of the dirty
pagesappear to be popular? 
>
> Yes.  The background writer cleaner process only does something useful if there are pages with low usage counts it
canevict.  You would need to increase shared_buffers significantly before it's likely that would happen.  Right now,
87%of your buffer cache has a usage count of 2 or higher, which basically means it's filled with almost nothing but the
workingset of data it never wants to evict unless it's for a checkpoint. 

Hm, my shared_buffers is already 10GB, but I'm using about 80GB for filesystem cache. Would a larger shared_buffers
makesense? I thought I read somewhere that 10GB is on the high end of the useful size for shared_buffers.

Re: tuning bgwriter in 8.4.2

From

Greg Smith

Date:

23 February 2010, 16:21:10

Ben Chobot wrote:
>
> Hm, my shared_buffers is already 10GB, but I'm using about 80GB for filesystem cache. Would a larger shared_buffers
makesense? I thought I read somewhere that 10GB is on the high end of the useful size for shared_buffers. 

Yeah, I figured that out when I was analyzing your figures and thought
I'd missed a decimal place when I first saw it.  The problem with huge
increases in shared_buffer is that they can increase the amount of time
it takes to allocate a new buffer.  If all you've got in there are lots
of records with high usage counts, it can end up taking multiple "sweeps
of the clock hand" over things to drop usage counts.  With >75% of your
buffers already having a usage count of 4 or 5, you've already gone
pretty far in the direction where that could happen.  With still around
13% only have a 0 or 1 usage count I don't think it's too bad yet.

You're certainly not in well explored territory though.  If you were
seeing large amounts of backend writes or buffers being allocated, maybe
a larger shared_buffers would make sense.  From the snapshots of data
you've provided, that doesn't seem to be the case though, so I wouldn't
be too worried about it.  The only thing you could really do here is
increase checkpoint_timeout - with this much data, having a checkpoint
every 5 minutes is on the fast side, and I'd bet you could tolerate the
disk space and additional crash recovery time in return for better
average performance the rest of the time.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us