Re: Bgwriter strategies - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bgwriter strategies
Date
Msg-id 5908.1183738544@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bgwriter strategies  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Bgwriter strategies  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-hackers
Greg Smith <gsmith@gregsmith.com> writes:
> On Thu, 5 Jul 2007, Tom Lane wrote:
>> This would give us a safety margin such that buffers_to_clean is not 
>> less than the largest demand observed in the last 100 iterations...and 
>> it takes quite a while for the memory of a demand spike to be forgotten 
>> completely.

> If you tested this strategy even on a steady load, I'd expect you'll find 
> there are large spikes in allocations during the occasional period where 
> everything is just right to pull a bunch of buffers in, and if you let 
> that max linger around for 100 iterations you'll write a large number of 
> buffers more than you need.

You seem to have the same misunderstanding as Heikki.  What I was
proposing was not a target for how many to *write* on each cycle, but
a target for how far ahead of the clock sweep hand to look.  If say
the target is 100, we'll scan forward from the sweep until we have seen
100 clean zero-usage-count buffers; but we only have to write whichever
of them weren't already clean.

This is actually not so different from my previous proposal, in that the
idea is to keep ahead of the sweep by a particular distance.  The
previous idea was that that distance was "all the buffers", whereas this
idea is "a moving average of the actual demand rate".  The excess writes
created by the previous proposal were because of the probability of
re-dirtying buffers between cleaning and recycling.  We reduce that
probability by not trying to keep so many of 'em clean.  But I think
that we can meet the goal of having backends do hardly any of the writes
with a relatively small increase in the target distance, and thus a
relatively small differential in the number of wasted writes.  Heikki's
test showed that Itagaki-san's patch wasn't doing that well in
eliminating writes by backends, so we need a more aggressive target for
how many buffers to keep clean than it has; but I think not a huge
amount more, and thus my proposal.

BTW, somewhere upthread you suggested combining the target-distance
idea with the idea that the cleaning work uses a separate sweep hand and
thus doesn't re-examine the same buffers on every bgwriter iteration.
The problem is that it'd be very hard to track how far ahead of the
recycling sweep hand we are, because that number has to be measured
in usage-count-zero pages.  I see no good way to know how many of the
pages we scanned before have been touched (and given nonzero usage
counts) unless we rescan them.

We could approximate it maybe: try to keep the cleaning hand N total
buffers ahead of the recycling hand, where N is the target number of
clean usage-count-zero buffers scaled by the average fraction of
count-zero buffers (which we can track a moving average of as we advance
the recycling hand).  However I'm not sure the complexity and
uncertainty is worth it.  What I took away from Heikki's experiment is
that trying to stay a large distance in front of the recycle sweep
isn't actually so useful because you get too many wasted writes due
to re-dirtying.  So restructuring the algorithm to make it cheap
CPU-wise to stay well ahead is not so useful either.

> I ended up settling on max(moving average of the last 16,most recent 
> allocation), and that seemed to work pretty well without being too 
> wasteful from excessive writes.

I've been doing moving averages for years and years, and I find that the
multiplication approach works at least as well as explicitly storing the
last K observations.  It takes a lot less storage and arithmetic too.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: pg_autovacuum -> pg_class.reloptions?
Next
From: Tom Lane
Date:
Subject: Re: pg_autovacuum -> pg_class.reloptions?