Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: Clock sweep not caching enough B-Tree leaf pages?
Date
Msg-id 20150415210025.GA6488@svana.org
Whole thread Raw
In response to Re: Clock sweep not caching enough B-Tree leaf pages?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Clock sweep not caching enough B-Tree leaf pages?
List pgsql-hackers
On Wed, Apr 15, 2015 at 12:37:44AM -0400, Robert Haas wrote:
> > I think such a solution will be good for the cases when many evictions
> > needs to be performed to satisfy the workload,  OTOH when there are
> > not too many evictions that needs to be done, in such a case some of
> > the buffers that are accessed much more will have equal probability to
> > get evicted as compare to buffers which are less accessed.
>
> Possibly, but I think it's even worse under the current algorithm.
> Under this proposal, if we go for a long time without any buffer
> evictions, every buffer usage's count will top out at 2 more than
> wherever it was after the last clock sweep.   In the worst case, every
> buffer (or most of them) could end up with the same usage count.  But
> under the status quo, they'll all go to 5, which is an even bigger
> loss of information, and which will make the first eviction much more
> expensive than if they are all pegged at 2 or 3.

I've been following this thread from the side with interest and got
twigged by the point about loss of information.  If you'd like better
information about relative ages, you can acheive this by raising the
cap on the usage count and dividing (or right-shifting) each sweep.

This would allow you to remember much more about about the relative
worth of often used pages.  With a cap of 32 you'd have the same effect
as now where after 5 sweeps the buffer is evicted.  Mathematically the
count would converge to the number of times the block is used per
sweep.

If you wanted to be really clever, you could at the beginning of each
sweep take an estimate of the number of buffers used since the last
sweep (from the stats collector perhaps) and use that to drive your
divisor, so if you have a lots of allocations you become more
aggressive about reducing the counts.  Or if the load is light fall
back to just subtracting one.  Then you don't need a cap at all.

(Apologies if this has been suggested before, Google didn't find
anything for me).

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

pgsql-hackers by date:

Previous
From: Qingqing Zhou
Date:
Subject: Re: reparsing query
Next
From: Alvaro Herrera
Date:
Subject: Re: reparsing query