Re: [Testperf-general] BufferSync and bgwriter - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [Testperf-general] BufferSync and bgwriter
Date
Msg-id 1102889288.4037.2806.camel@localhost.localdomain
Whole thread Raw
In response to Re: [Testperf-general] BufferSync and bgwriter  (Neil Conway <neilc@samurai.com>)
Responses Re: [Testperf-general] BufferSync and bgwriter
Re: [Testperf-general] BufferSync and bgwriter
Re: [Testperf-general] BufferSync and bgwriter
List pgsql-hackers
> On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
> Simon Riggs wrote:
> > If the bgwriter_percent = 100, then we should actually do the sensible
> > thing and prepare the list that we need, i.e. limit
> > StrategyDirtyBufferList to finding at most bgwriter_maxpages.
>
> Is the plan to make bgwriter_percent = 100 the default setting?

Hmm...must confess that my only plan is:
i) discover dynamic behaviour of bgwriter
ii) fix any bugs or wierdness as quickly as possible
iii) try to find a way to set the bgwriter defaults

I'm worried that we're late in the day for changes, but I'm equally
worried that a) the bgwriter is very tuning sensitive b) we don't really
have much info on how to set the defaults in a meaningful way for the
majority of cases c) there are some issues that greatly reduce the
effectiveness of the bgwriter in many circumstances.

The 100pct.patch was my first attempt at getting something acceptable in
the next few days that gives sufficient room for the DBA to perform
tuning.

On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
> I wonder if we even need to retain the bgwriter_percent GUC var. Is
> there actually a situation in which the combination of bgwriter_maxpages
> and bgwriter_delay does not give the DBA sufficient flexibility in
> tuning bgwriter behavior?

Yes, I do now think that only two GUCs are required to tune the
behaviour; but you make me think - which two? Right now, bgwriter_delay
is useless because the O(N) behaviour makes it impossible to set any
lower when you have a large shared_buffers. (I see that as a bug)

Your question has made me rethink the exact objective of the bgwriter's
actions: The way it is coded now the bgwriter looks for dirty blocks, no
matter where they are in the list. What we are bothered about is the
number of clean buffers at the LRU, which has a direct influence on the
probability that BufferAlloc() will need to call FlushBuffer(), since
StrategyGetBuffer() returns the first unpinned buffer, dirty or not.
After further thought, I would prefer a subtle change in behaviour so
that the bgwriter checks that clean blocks are available at the LRUs for
when buffer replacement occurs. With that slight change, I'd keep the
bgwriter_percent GUC but make it mean something different.

bgwriter_percent would be the % of shared_buffers that are searched
(from the LRU end) to see if they contain dirty buffers, which are then
written to disk.  That means the number of dirty blocks written to disk
is between 0 and the number of buffers searched, but we're not hugely
bothered what that number is... [This change to StrategyDirtyBufferList
resolves the unusability of the bgwriter with large shared_buffers]

Writing away dirty blocks towards the MRU end is more likely to be
wasted effort. If a block stays near the MRU then it will be dirty again
in the wink of an eye, so you gain nothing at checkpoint time by
cleaning it. Also, since it isn't near the LRU, cleaning it has no
effect on buffer replacement I/O. If a block is at the LRU, then it is
by definition the least likely to be reused, and is a candidate for
replacement anyway. So concentrating on the LRU, not the number of dirty
buffers seems to be the better thing to do.

That would then be a much simpler way of setting the defaults. With that
definition, we would set the defaults:

bgwriter_percent = 2 (according to my new suggestion here)
bgwriter_delay = 200
bgwriter_maxpages = -1 (i.e. mostly ignore it, but keep it for fine
tuning)

Thus, for the default shared_buffers=1000 the bgwriter would clear a
space of up to 20 blocks each cycle.
For a config with shared_buffers=60000, the bgwriter default would clear
space for 600 blocks (max) each cycle - a reasonable setting.

Overall that would need very little specific tuning, because it would
scale upwards as you changed the shared_buffers higher.

So, that interpretation of bgwriter_percent gives these advantages:
- we bound the StrategyDirtyBufferList scan to a small % of the whole
list, rather than the whole list...so we could realistically set the
bgwriter_delay lower if required
- we can set a default that scales, so would not often need to change it
- the parameter is defined in terms of the thing we really care about:
sufficient clean blocks at the LRU of the buffer lists
- these changes are very isolated and actually minor - just a different
way of specifying which buffers the bgwriter will clean

Patch attached...again for discussion and to help understanding of this
proposal. Will submit to patches if we agree it seems like the best way
to allow the bgwriter defaults to be sensibly set.

[...and yes, everybody, I do know where we are in the release cycle]

--
Best Regards, Simon Riggs

Attachment

pgsql-hackers by date:

Previous
From: "Andrew Dunstan"
Date:
Subject: Re: buildfarm build failure: icc7 + --enable-cassert
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: old-style handler problem