Re: [HACKERS] Clock with Adaptive Replacement - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] Clock with Adaptive Replacement
Date
Msg-id CAH2-Wz=NvPOCL3_OcFR_6k6KKcK7GqhR-dqi4QfB28MD5NiycA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Clock with Adaptive Replacement  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Clock with Adaptive Replacement  (Yura Sokolov <funny.falcon@gmail.com>)
Re: [HACKERS] Clock with Adaptive Replacement  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, May 1, 2018 at 11:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> That's not to say that the total of all usage counts must remain equal
> to a constant or something dumb like that -- there's decisions to be
> made in terms of how you implement things.  For example, you can
> imagine a system where every time we would bump the usage count of a
> buffer that already has a usage count of 5, we would instead advance
> the clock hand by 1 buffer.  In such a system the total of the usage
> counts can fluctuate, but you can't peg everything at 5 because
> there's back-pressure built into the algorithm.

This seems to be an old idea. See "Data Cache Management Using
Frequency-Based Replacement", a 1990 paper that is referenced by the
ARC paper. It says:

"When a block is first brought into the cache, its reference count is
initialized to one. Previous algorithms using reference counts have
incremented the count for a given block on every reference to that
block. This results in the following kind of problem: certain blocks
are relatively infrequently referenced overall, and yet when they are
referenced, due to locality there are short intervals of repeated
re-references, thus building up high reference counts. After such an
interval is over, the high reference count is misleading: it is due to
locality, and cannot be used to estimate the probability that such a
block will be re-referenced following the end of this interval (this
description is intuitive, since in practice such intervals are not as
well-defined as the preceding discussion may suggest)."

It goes on to propose a solution, that I gather isn't so far from your
thought experiment design for fixing the usage_count saturation
problem:

"More precisely, we dynamically maintain the sum of all reference
counts.  Whenever the average reference count exceeds a predetermined
maximum value, A-max (another parameter of the algorithm), every
reference count C is reduced to C/2. Thus, in the steady state the sum
of all reference counts stays between N x A-max/2 and N x A-max (where
N is the number of cache blocks), and the reference count of a block
is approximately linearly related to the frequency of references to
the block while it is in the middle and old sections. Note that in
this reduction of reference counts, a count of one remains at one, a
count of two is reduced to one, a count of three is reduced to two,
etc."

This 1990 paper seems to only try to solve the correlated references
problem that we see when using CLOCK to approximate an LRU, which is
not to be confused with actually weighing frequency in a clever way,
which AFAICT is a general idea that didn't come along until the LRU-K
paper came out a couple of years later. In other words, the 1990 paper
only cares about making sure that CLOCK actually manages to
approximate an LRU when reference counts are messed up by correlated
references. That's all. This really is a distinct problem to the
problem that ARC and CAR aim to solve (2Q and LRU-K must be in the
same category, since IIRC they have ghost lists). CART tries to solve
both problems at the same time, which suggests that they're
independent problems.

In summary: I see two clearly distinct problems here, though they are
superficially similar. We have both problems.

> That's a really appealing property.  If there's a "hot" set of buffers
> that accounts for nearly all accesses, the algorithm will be willing
> to regard that all buffers in that set as frequently-used regardless
> of whether it is a large or a small percentage of shared_buffers, and
> they will never get evicted.  But the buffers must really be hot
> enough to justify the space they're taking up.  When there are only a
> few hot buffers, it's sufficient for them to be a little hotter than
> the typical buffer.  But when there are a lot of hot buffers, they
> have to be *really* hot.  Intuitively, that makes sense, because
> making the set of "hot" -- and thus unevictable -- buffers large means
> that other buffers are going to get evicted very quickly.  That's only
> justifiable if the hot buffers are very significantly more useful than
> the non-hot buffers.

Right. How hot "really hot" actually is needs to be based on the
behavior of the system as a whole. If it's impossible to better
characterize the behavior of the system as a whole, then no
improvement is possible; shared_buffers services the system as a
whole.

I've always had a problem with the 8GB/16GB upper limit on the size of
shared_buffers. Not because it's wrong, but because it's not something
that has ever been explained. I strongly suspect that it has something
to do with usage_count saturation, since it isn't reproducible with
any synthetic workload that I'm aware of. Quite possibly because there
are few bursty benchmarks.

>> All of that having been said, maybe we have an independent low-level
>> problem: we increase usage_count when we pin a buffer, even if we last
>> pinned the buffer 5ms ago. IIRC a change to this went in when ARC went
>> in (and came out with ARC, too). This is more of a problem with
>> miscounting the number of hits based on accidents around how things
>> are structured in the executor -- we need to be more careful about
>> recognizing whether or not block hits are logically distinct.
>> Particularly with stuff like nested loop joins.
>
> I agree that double-counting correlated accesses is a a problem, and I
> agree that we probably want to do something about it.  I am skeptical
> that using wall-clock time is the right way to solve that problem
> because it leads to edge effects

I agree that wall-clock time is a bad approach, actually. If we were
to use wall-clock time, we'd only do so because it can be used to
discriminate against a thing that we actually care about in an
approximate, indirect way. If there is a more direct way of
identifying correlated accesses, which I gather that there is, then we
should probably use it.

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A few warnings on Windows
Next
From: David Rowley
Date:
Subject: Re: Should we add GUCs to allow partition pruning to be disabled?