Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Clock sweep not caching enough B-Tree leaf pages?
Date
Msg-id CA+TgmoZs6p3WZan8XisdL9s7pu6DxpX6cyHJBBeCVAKsM1FW7w@mail.gmail.com
Whole thread Raw
In response to Re: Clock sweep not caching enough B-Tree leaf pages?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Clock sweep not caching enough B-Tree leaf pages?
Re: Clock sweep not caching enough B-Tree leaf pages?
List pgsql-hackers
On Wed, Apr 16, 2014 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> Anyways, I'm still curious if you can post similar numbers basing the
>> throttling on gross allocation counts instead of time.  Meaning: some
>> number of buffer allocations has to have occurred before you consider
>> eviction.  Besides being faster I think it's a better implementation:
>> an intermittently loaded server will give more consistent behavior.
>
> Yeah --- I think wall-clock-based throttling is fundamentally the wrong
> thing anyway.  Are we going to start needing a CPU speed measurement to
> tune the algorithm with?  Not the place to be going.  But driving it off
> the number of allocations that've been done could be sensible.  (OTOH,
> that means you need a central counter, which itself would be a
> bottleneck.)

So, I was thinking about this a little bit more today, prodded by my
coworker John Gorman.  I'm wondering if we could drive this off of the
clock sweep; that is, every time the clock sweep touches a buffer, its
usage count goes down by one, but we also set two flag bits.  Every
time the buffer gets touched thereafter, we check whether any flag
bits are set; if so, we clear one and increase the usage count, else
we do nothing.  So the usage count can increase at most twice per
clock sweep.  The advantage of that is that, as with Peter's approach,
it is harder for the usage count of a buffer to max out - to get
there, you need sustained access over a longer period of time.  But
it's not time-dependent, so replaying the same workload at twice the
speed or half the speed on faster or slower hardware doesn't change
the choice of which buffer to evict, which seems good.  And it will
cause us to prefer buffers which are accessed repeatedly over a period
of time rather than buffers that are accessed a bunch of times in
quick succession and then not touched again for a while, which seems
like a good bet.

I can't convince myself that this fully solves the (currently
existing) problem of usage counts increasing too fast.  In theory,
every time the clock sweep hand advances, it decrements the usage
count of one buffer by one, but also makes it possible for the usage
count of that buffer to increase by up to two (or by only one if the
buffer previously had a usage count of five).  So with the right
access pattern, it seems like you could still get to a place where a
typical buffer eviction has to do a lot of scanning before it finds a
victim buffer.  As with the present algorithm, we'd be most likely to
have that problem when the buffer hit ratio is very high, so that
there are many opportunities for usage counts to go up between each
opportunity to push usage counts back down.  But it seems like it
would at least limit the damage.

I haven't tried this or anything, so this is just random brainstorming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Make more portable TAP tests of initdb
Next
From: Robert Haas
Date:
Subject: Re: logical column ordering