Thread: W-TinyLfu for cache eviction

W-TinyLfu for cache eviction

From
Vladimir Sitnikov
Date:
I've recently noticed W-TinyLfu cache admission policy (see [1]) being
used for caffeine "high performance caching library for Java 8".
It demonstrates high cache hit ratios (see [2]) and enables to build
high-throughput caches (see caffeine in [3])
Authors explicitly allow implementations of the algorithm (see [4]).

Does it make sense to evaluate the algorithm for buffer replacement?

[1]: http://arxiv.org/pdf/1512.00727v1.pdf
[2]: https://github.com/ben-manes/caffeine/wiki/Efficiency
[3]: https://github.com/ben-manes/caffeine/wiki/Benchmarks
[4]: https://github.com/ben-manes/caffeine/issues/23#issuecomment-161536706

Vladimir Sitnikov



Re: W-TinyLfu for cache eviction

From
Robert Haas
Date:
On Thu, Dec 3, 2015 at 2:27 AM, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
> I've recently noticed W-TinyLfu cache admission policy (see [1]) being
> used for caffeine "high performance caching library for Java 8".
> It demonstrates high cache hit ratios (see [2]) and enables to build
> high-throughput caches (see caffeine in [3])
> Authors explicitly allow implementations of the algorithm (see [4]).
>
> Does it make sense to evaluate the algorithm for buffer replacement?

Maybe.  Want to code it up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: W-TinyLfu for cache eviction

From
Konstantin Knizhnik
Date:

On 03.12.2015 10:27, Vladimir Sitnikov wrote:
> I've recently noticed W-TinyLfu cache admission policy (see [1]) being
> used for caffeine "high performance caching library for Java 8".
> It demonstrates high cache hit ratios (see [2]) and enables to build
> high-throughput caches (see caffeine in [3])
> Authors explicitly allow implementations of the algorithm (see [4]).
>
> Does it make sense to evaluate the algorithm for buffer replacement?

I expect synchronization issues with implementation of this algorithm. 
It seems to be hard to avoid some global critical section which can 
cause significant performance degradation at MPP systems (see topic 
"Move PinBuffer and UnpinBuffer to atomics").


> [1]: http://arxiv.org/pdf/1512.00727v1.pdf
> [2]: https://github.com/ben-manes/caffeine/wiki/Efficiency
> [3]: https://github.com/ben-manes/caffeine/wiki/Benchmarks
> [4]: https://github.com/ben-manes/caffeine/issues/23#issuecomment-161536706
>
> Vladimir Sitnikov
>
>




Re: W-TinyLfu for cache eviction

From
Ants Aasma
Date:
On Wed, Dec 9, 2015 at 11:31 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> I expect synchronization issues with implementation of this algorithm. It
> seems to be hard to avoid some global critical section which can cause
> significant performance degradation at MPP systems (see topic "Move
> PinBuffer and UnpinBuffer to atomics").

The sketch updates don't really need to be globally consistent,
anything that is faster than cycling of buffers through the LRU tier
would be enough. In principle atomic increments could be used, but
funneling all buffer hits onto a small amount of cachelines and then
doing 4x the amount of writes would result in extremely nasty cache
line ping-ponging. Caffeine implementation appears to solve this by
batching the frequency sketch updates using a striped set of circular
buffers. Re-implementing this is not exactly trivial and some prior
experience with lock free programming seems well advised.

Based on the paper it looks like this algorithm can work with a low
quality primary eviction algorithm. Swapping our GCLOCK out for
something simpler like plain CLOCK could simplify an atomics based
PinBuffer approach. Another interesting avenue for research would be
to use ideas in TinyLFU to implement a tier of "nailed" buffers that
have backend local or striped pin-unpin accounting.

But for checking if the replacement policy implemented by W-TinyLFU is
good it isn't necessary to have a performant locking solution. I think
a global lock would be good enough for a proof of concept that only
evaluates cache hit ratios.

Regards,
Ants Aasma



Re: W-TinyLfu for cache eviction

From
Vladimir Sitnikov
Date:
> a global lock would be good enough for a proof of concept that only
evaluates cache hit ratios.

I think emulator can be used to check hit ratios. That way we can see
how different algorithms affect hit ratio.

Is there a set of traces of "buffer load events"? (I did some Google
searches like "postgresql buffer cache trace" with no luck)
Is there an option that enables tracing of each requested buffer Id?

Frankly speaking, I've no access to PG instances with lots of data
(i.e. >10GiB).

> Maybe.  Want to code it up?

That would be interesting, however: I'm not fluent at C. I've never
written multithreaded C code either. I understand what a cache line is
though.
Anyway, before hacking a prototype it makes sense to gather some traces.

Vladimir



Re: W-TinyLfu for cache eviction

From
Amit Kapila
Date:
On Mon, Dec 14, 2015 at 12:18 PM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
> > a global lock would be good enough for a proof of concept that only
> evaluates cache hit ratios.
>
> I think emulator can be used to check hit ratios. That way we can see
> how different algorithms affect hit ratio.
>
> Is there a set of traces of "buffer load events"? (I did some Google
> searches like "postgresql buffer cache trace" with no luck)
> Is there an option that enables tracing of each requested buffer Id?
>

To get the detailed trace for each buffer, I think you can use dynamic
tracing as explained in documentation [1].

Another simple way could be to use:
 Explain (Analyze, Buffers) <sql_statement>

This will give blocks hit and read numbers which could be useful in your
experiments.