Re: [Testperf-general] BufferSync and bgwriter - Mailing list pgsql-hackers

From Neil Conway
Subject Re: [Testperf-general] BufferSync and bgwriter
Date
Msg-id 1102905813.23208.32.camel@localhost.localdomain
Whole thread Raw
In response to Re: [Testperf-general] BufferSync and bgwriter  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [Testperf-general] BufferSync and bgwriter
Re: [Testperf-general] BufferSync and bgwriter
List pgsql-hackers
On Sun, 2004-12-12 at 22:08 +0000, Simon Riggs wrote:
> > On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
> > Is the plan to make bgwriter_percent = 100 the default setting?
> 
> Hmm...must confess that my only plan is:
> i) discover dynamic behaviour of bgwriter
> ii) fix any bugs or wierdness as quickly as possible
> iii) try to find a way to set the bgwriter defaults

I was just curious why you were bothering to special-case
bgwriter_percent = 100 if it's not going to be the default setting (in
which case I would be surprised if more than 1 in 10 users would take
advantage of the patch).

> Right now, bgwriter_delay
> is useless because the O(N) behaviour makes it impossible to set any
> lower when you have a large shared_buffers.

BTW, I wouldn't be _too_ worried about O(N) behavior, except that we do
this scan while holding the BufMgrLock, which is a well known source of
contention. So reducing the time we hold that lock would be good.

> Your question has made me rethink the exact objective of the bgwriter's
> actions: The way it is coded now the bgwriter looks for dirty blocks, no
> matter where they are in the list.

Not sure what you mean. StrategyDirtyBufferList() returns the specified
number of dirty buffers in order, starting with the T1/T2 LRUs and going
back to the MRUs of both lists. bgwriter_percent effectively ignores
some portion of the tail of that list, so we end up just flushing the
buffers closest to the L1/L2 LRUs. How is this different from what
you're describing?

> bgwriter_percent would be the % of shared_buffers that are searched
> (from the LRU end) to see if they contain dirty buffers, which are
> then written to disk.

By definition, buffers closest to the LRU end of the lists are not
frequently accessed. If we only search the N% of the lists closest to
LRU, we will probably end up flushing just those pages to disk -- and
then not flushing anything else to disk in the subsequent bgwriter calls
because all the buffers close to the LRU will be non-dirty. That's okay
if all we're concerned about is avoiding write() by a real backend, but
we also want to smooth out checkpoint load, which I don't think this
approach would do well.

I suggest just getting rid of bgwriter_percent: AFAICS bgwriter_maxpages
is all the tuning we need, and I think "max # of pages to write" is a
simpler and more logical tuning knob than "% of the buffer pool to scan
looking for dirty buffers." So at each bufmgr invocation, we pick the at
most bgwriter_maxpages dirty pages from the pool, using the pages
closest to the LRUs of T1 and T2. I'd be happy to supply a patch to
implement that if you think it sounds okay.

-Neil




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: somebody working on: Prevent default re-use of sysids
Next
From: Bruce Momjian
Date:
Subject: Re: Status of server side Large Object support?