Home > mailing lists

Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Clock sweep not caching enough B-Tree leaf pages?
Date	April 16, 2014 04:44:16
Msg-id	CA+TgmobcJKTGb=XgwH90Wz9Jc4Sa+F4mknCKEcyHjevcNQfHew@mail.gmail.com Whole thread Raw
In response to	Clock sweep not caching enough B-Tree leaf pages? (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Clock sweep not caching enough B-Tree leaf pages? Re: Clock sweep not caching enough B-Tree leaf pages? Re: Clock sweep not caching enough B-Tree leaf pages? Re: Clock sweep not caching enough B-Tree leaf pages?
List	pgsql-hackers

Tree view

On Mon, Apr 14, 2014 at 1:11 PM, Peter Geoghegan <pg@heroku.com> wrote:
> In the past, various hackers have noted problems they've observed with
> this scheme. A common pathology is to see frantic searching for a
> victim buffer only to find all buffer usage_count values at 5. It may
> take multiple revolutions of the clock hand before a victim buffer is
> found, as usage_count is decremented for each and every buffer.  Also,
> BufFreelistLock contention is considered a serious bottleneck [1],
> which is related.

I think that the basic problem here is that usage counts increase when
buffers are referenced, but they decrease when buffers are evicted,
and those two things are not in any necessary way connected to each
other.  In particular, if no eviction is happening, reference counts
will converge to the maximum value.  I've read a few papers about
algorithms that attempt to segregate the list of buffers into "hot"
and "cold" lists, and an important property of such algorithms is that
they mustn't be allowed to make everything hot.  It's easy to be too
simplistic, here: an algorithm that requires that no more than half
the list be hot will fall over badly on a workload where the working
set exceeds the available cache and the really hot portion of the
working set is 60% of the available cache.  So you need a more
sophisticated algorithm than that.  But that core property that not
all buffers can be hot must somehow be preserved, and our algorithm
doesn't.

This isn't a fundamental property of the usage-count idea; it's an
artifact of the fact that usage count decreases are tied to eviction
pressure rather than access pressure.  For example, suppose we made a
rule that if the total usage counts of all buffers exceed 3 *
NBuffers, then every time you bump the usage count of a buffer from N
to N+1, you're required to advance the clock sweep far enough to
decrease the reference count of a buffer by one.  When you want to
reclaiim a buffer, you advance a separate clock sweep until you find a
buffer with a zero usage count; if you circle the whole ring without
finding one, then you reclaim the buffer you saw with the lowest usage
count.  There are obvious scalability problems here (everyone fighting
over the right to advance the clock sweep) but ignore that for the
sake of the thought experiment: now you have an algorithm where not
all buffers can be hot.  If some buffers are hotter than others, then
whenever their usage count is decreased it will immediately get pushed
back up again, but some other buffer then has to absorb the decrease.
Only the buffers that are really hot can maintain high usage counts,
because *somebody* has to have a low usage count.

Even ignoring scalability concerns, this might not be (and probably
isn't) exactly what we want to implement, but I think it illustrates
an important control principle all the same: buffer "cooling" needs to
be driven by the same underlying phenomenon - probably buffer access -
as buffer "heating".  If they're driven by unrelated phenomena, then
the rates may be wildly incomparable, and you'll end up with
everything hot or everything cold.  If that happens, you lose, because
with everything the same, there's no principled way to decide which
things are actually best to evict.

If we come up with some good solution for shared buffers, we should
also consider it applying it to SLRU eviction.  I believe that the
current situation around CLOG eviction is none too pretty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Hannu Krosing
Date: 16 April 2014, 04:35:05
Subject: Re: Question about optimising (Postgres_)FDW

From: Rod Taylor
Date: 16 April 2014, 04:53:23
Subject: Re: How can we make beta testing better?

Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

Previous

Next