Re: Load distributed checkpoint V3 - Mailing list pgsql-patches

From Greg Smith
Subject Re: Load distributed checkpoint V3
Date
Msg-id Pine.GSO.4.64.0704050906540.6384@westnet.com
Whole thread Raw
In response to Re: Load distributed checkpoint V3  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Load distributed checkpoint V3  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-patches
On Thu, 5 Apr 2007, Heikki Linnakangas wrote:

> Bgwriter has two goals:
> 1. keep enough buffers clean that normal backends never need to do a write
> 2. smooth checkpoints by writing buffers ahead of time
> Load distributed checkpoints will do 2. in a much better way than the
> bgwriter_all_* guc options. I think we should remove that aspect of bgwriter
> in favor of this patch.

My first question about the LDC patch was whether I could turn it off and
return to the existing mechanism.  I would like to see a large pile of
data proving this new approach is better before the old one goes away.  I
think everyone needs to do some more research and measurement here before
assuming the problem can be knocked out so easily.

The reason I've been busy working on patches to gather statistics on this
area of code is because I've tried most simple answers to getting the
background writer to work better and made little progress, and I'd like to
see everyone else doing the same at least collecting the right data.

Let me suggest a different way of looking at this problem.  At any moment,
some percentage of your buffer pool is dirty.  Whether it's 0% or 100%
dramatically changes what the background writer should be doing.  Whether
most of the data is usage_count>0 or not also makes a difference.  None of
the current code has any idea what type of buffer pool they're working
with, and therefore they don't have enough information to make a
well-informed prediction about what is going to happen in the near future.

I'll tell you what I did to the all-scan.  I ran a few hundred hours worth
of background writer tests to collect data on what it does wrong, then
wrote a prototype automatic background writer that resets the all-scan
parameters based on what I found.  It keeps a running estimate of how
dirty the pool at large is using a weighted average of the most recent
scan with the past history.  From there, I have a simple model that
predicts how much of the buffer we can scan in any interval, and intends
to enforce a maximum bound on the amount of physical I/O you're willing to
stream out.  The beta code is sitting at
http://www.westnet.com/~gsmith/content/postgresql/bufmgr.c if you want to
see what I've done so far.  The parts that are done work fine--as long as
you give it a reasonable % to scan by default, it will correct
all_max_pages and the interval in real-time to meet the scan rate
requested you want given how much is currently dirty; the I/O rate is
computed but doesn't limit properly yet.

Why haven't I brought this all up yet?  Two reasons.  The first is because
it doesn't work on my system; checkpoints and overall throughput get worse
when you try to shorten them by running the background writer at optimal
aggressiveness.  Under really heavy load, the writes slow down as all the
disk caches fill, the background writer fights with reads on the data that
isn't in the mostly dirty cache (introducing massive seek delays), it
stops cleaning effectively, and it's better for it to not even try.  My
next generation of code was going to start with the LRU flush and then
only move onto the all-scan if there's time leftover.

The second is that I just started to get useful results here in the last
few weeks, and I assumed it's too big of a topic to start suggesting major
redesigns to the background writer mechanism at that point (from me at
least!).  I was waiting for 8.3 to freeze before even trying.  If you want
to push through a redesign there, maybe you can get away with it at this
late moment.  But I ask that you please don't remove anything from the
current design until you have significant test results to back up that
change.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-patches by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Fix mdsync never-ending loop problem
Next
From: Bruce Momjian
Date:
Subject: Re: CREATE TABLE LIKE INCLUDING INDEXES support