Re: Load distributed checkpoint V3 - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: Load distributed checkpoint V3
Date
Msg-id 46151A83.9060200@enterprisedb.com
Whole thread Raw
In response to Re: Load distributed checkpoint V3  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Load distributed checkpoint V3  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Load distributed checkpoint V3  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-patches
Greg Smith wrote:
> On Thu, 5 Apr 2007, Heikki Linnakangas wrote:
>
>> Bgwriter has two goals:
>> 1. keep enough buffers clean that normal backends never need to do a
>> write
>> 2. smooth checkpoints by writing buffers ahead of time
>> Load distributed checkpoints will do 2. in a much better way than the
>> bgwriter_all_* guc options. I think we should remove that aspect of
>> bgwriter in favor of this patch.
>
> ...
>
> Let me suggest a different way of looking at this problem.  At any
> moment, some percentage of your buffer pool is dirty.  Whether it's 0%
> or 100% dramatically changes what the background writer should be
> doing.  Whether most of the data is usage_count>0 or not also makes a
> difference.  None of the current code has any idea what type of buffer
> pool they're working with, and therefore they don't have enough
> information to make a well-informed prediction about what is going to
> happen in the near future.

The purpose of the bgwriter_all_* settings is to shorten the duration of
the eventual checkpoint. The reason to shorten the checkpoint duration
is to limit the damage to other I/O activity it causes. My thinking is
that assuming the LDC patch is effective (agreed, needs more testing) at
smoothening the checkpoint, the duration doesn't matter anymore. Do you
want to argue there's other reasons to shorten the checkpoint duration?

> I'll tell you what I did to the all-scan.  I ran a few hundred hours
> worth of background writer tests to collect data on what it does wrong,
> then wrote a prototype automatic background writer that resets the
> all-scan parameters based on what I found.  It keeps a running estimate
> of how dirty the pool at large is using a weighted average of the most
> recent scan with the past history.  From there, I have a simple model
> that predicts how much of the buffer we can scan in any interval, and
> intends to enforce a maximum bound on the amount of physical I/O you're
> willing to stream out.  The beta code is sitting at
> http://www.westnet.com/~gsmith/content/postgresql/bufmgr.c if you want
> to see what I've done so far.  The parts that are done work fine--as
> long as you give it a reasonable % to scan by default, it will correct
> all_max_pages and the interval in real-time to meet the scan rate
> requested you want given how much is currently dirty; the I/O rate is
> computed but doesn't limit properly yet.

Nice. Enforcing a max bound on the I/O seems reasonable, if we accept
that shortening the checkpoint is a goal.

> Why haven't I brought this all up yet?  Two reasons.  The first is
> because it doesn't work on my system; checkpoints and overall throughput
> get worse when you try to shorten them by running the background writer
> at optimal aggressiveness.  Under really heavy load, the writes slow
> down as all the disk caches fill, the background writer fights with
> reads on the data that isn't in the mostly dirty cache (introducing
> massive seek delays), it stops cleaning effectively, and it's better for
> it to not even try.  My next generation of code was going to start with
> the LRU flush and then only move onto the all-scan if there's time
> leftover.
>
> The second is that I just started to get useful results here in the last
> few weeks, and I assumed it's too big of a topic to start suggesting
> major redesigns to the background writer mechanism at that point (from
> me at least!).  I was waiting for 8.3 to freeze before even trying.  If
> you want to push through a redesign there, maybe you can get away with
> it at this late moment.  But I ask that you please don't remove anything
> from the current design until you have significant test results to back
> up that change.

Point taken. I need to start testing the LDC patch.

Since we're discussing this, let me tell what I've been thinking about
the lru cleaning behavior of bgwriter. ISTM that that's more
straigthforward to tune automatically. Bgwriter basically needs to
ensure that the next X buffers with usage_count=0 in the clock sweep are
clean. X is the predicted number of buffers backends will evict until
the next bgwriter round.

The number of buffers evicted by normal backends in a bgwriter_delay
period is simple to keep track of, just increase a counter in
StrategyGetBuffer and reset it when bgwriter wakes up. We can use that
as an estimate of X with some safety margin.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: CREATE TABLE LIKE INCLUDING INDEXES support
Next
From: Heikki Linnakangas
Date:
Subject: Re: Fix mdsync never-ending loop problem