Re: [Testperf-general] BufferSync and bgwriter - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [Testperf-general] BufferSync and bgwriter
Date
Msg-id 1102929083.4037.3560.camel@localhost.localdomain
Whole thread Raw
In response to Re: [Testperf-general] BufferSync and bgwriter  (Neil Conway <neilc@samurai.com>)
Responses Re: [Testperf-general] BufferSync and bgwriter
List pgsql-hackers
On Mon, 2004-12-13 at 02:43, Neil Conway wrote:
> On Sun, 2004-12-12 at 22:08 +0000, Simon Riggs wrote:
> > > On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
> > > Is the plan to make bgwriter_percent = 100 the default setting?
> >
> > Hmm...must confess that my only plan is:
> > i) discover dynamic behaviour of bgwriter
> > ii) fix any bugs or wierdness as quickly as possible
> > iii) try to find a way to set the bgwriter defaults
>
> I was just curious why you were bothering to special-case
> bgwriter_percent = 100 if it's not going to be the default setting (in
> which case I would be surprised if more than 1 in 10 users would take
> advantage of the patch).
>
> > Right now, bgwriter_delay
> > is useless because the O(N) behaviour makes it impossible to set any
> > lower when you have a large shared_buffers.
>
> BTW, I wouldn't be _too_ worried about O(N) behavior, except that we do
> this scan while holding the BufMgrLock, which is a well known source of
> contention. So reducing the time we hold that lock would be good.

Yes, the duration of the BufMgrLock held during StrategyDirtyBufferList
and its effect on system performance is my concern. Reducing that is one
of the primary objectives here (point (ii)).

> > bgwriter_percent would be the % of shared_buffers that are searched
> > (from the LRU end) to see if they contain dirty buffers, which are
> > then written to disk.
>
> By definition, buffers closest to the LRU end of the lists are not
> frequently accessed. If we only search the N% of the lists closest to
> LRU, we will probably end up flushing just those pages to disk -- and
> then not flushing anything else to disk in the subsequent bgwriter calls
> because all the buffers close to the LRU will be non-dirty. That's okay
> if all we're concerned about is avoiding write() by a real backend, but
> we also want to smooth out checkpoint load, which I don't think this
> approach would do well.

My argument for that was: N% of lists closest to LRU approach gives
- constant search time (searching for N dirty buffers causes a variable
number of buffers to be searched, so lock time varies...)
- if blocks are no longer used, they eventually migrate to the LRU, so
they then get written away by bgwriter rather than at checkpoint time.
- the blocks near the MRU get dirtied again fairly quickly, so still
need to be flushed again at checkpoint
So, overall, I think this would smooth out the checkpoint load

We've little time left: If we do not manage to perform a performance
test that shows that this argument is valid, then I'd agree that we drop
that idea (for now) because of the risk that it does have the
side-effect you mention.

Longer term, I think possibly having two types of bgwriter activity
would be worthwhile:
1) short and frequent LRU cleaning
2) longer but less frequent mini-checkpoints that reach up towards the
MRU

> I suggest just getting rid of bgwriter_percent: AFAICS bgwriter_maxpages
> is all the tuning we need, and I think "max # of pages to write" is a
> simpler and more logical tuning knob than "% of the buffer pool to scan
> looking for dirty buffers." So at each bufmgr invocation, we pick the at
> most bgwriter_maxpages dirty pages from the pool, using the pages
> closest to the LRUs of T1 and T2. I'd be happy to supply a patch to
> implement that if you think it sounds okay.

Whichever way we do it, we agree that bgwriter_maxpages is all the
tuning that you and I need.

My suggestion was to provide both the tuning knob AND removing the need
for the knob completely for the (as you say) 9 out of 10 people that
never will perform any tuning, by using bgwriter_percent to set a value
that is approximately correct all of the time.

Anyway, thanks for taking the time to read all of these postings. We're
clearly agreed on the main aspect of this, AFAICS.

I'd be happy to supply a patch to
> implement that if you think it sounds okay.

...my understanding is that you'd only be touching BufferSync() to
simplify it, and to remove all of the bgwriter_percent GUC stuff and its
call path to BufferSync()?

I've hacked my patch down to show what I think you mean for the
BufferSync() changes.... to allow perf comparisons if time allows.
Clearly your own patch will more accurately portray those...

--
Best Regards, Simon Riggs

Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [Testperf-general] BufferSync and bgwriter
Next
From: "Mark Cave-Ayland"
Date:
Subject: Re: join selectivity