Thread: W-TinyLfu for cache eviction

W-TinyLfu for cache eviction

From

Vladimir Sitnikov

Date:

03 December 2015, 07:27:55

I've recently noticed W-TinyLfu cache admission policy (see [1]) being
used for caffeine "high performance caching library for Java 8".
It demonstrates high cache hit ratios (see [2]) and enables to build
high-throughput caches (see caffeine in [3])
Authors explicitly allow implementations of the algorithm (see [4]).

Does it make sense to evaluate the algorithm for buffer replacement?

[1]: http://arxiv.org/pdf/1512.00727v1.pdf
[2]: https://github.com/ben-manes/caffeine/wiki/Efficiency
[3]: https://github.com/ben-manes/caffeine/wiki/Benchmarks
[4]: https://github.com/ben-manes/caffeine/issues/23#issuecomment-161536706

Vladimir Sitnikov

Re: W-TinyLfu for cache eviction

From

Robert Haas

Date:

08 December 2015, 19:45:20

On Thu, Dec 3, 2015 at 2:27 AM, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
> I've recently noticed W-TinyLfu cache admission policy (see [1]) being
> used for caffeine "high performance caching library for Java 8".
> It demonstrates high cache hit ratios (see [2]) and enables to build
> high-throughput caches (see caffeine in [3])
> Authors explicitly allow implementations of the algorithm (see [4]).
>
> Does it make sense to evaluate the algorithm for buffer replacement?

Maybe.  Want to code it up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: W-TinyLfu for cache eviction

From

Konstantin Knizhnik

Date:

09 December 2015, 09:32:00


On 03.12.2015 10:27, Vladimir Sitnikov wrote:
> I've recently noticed W-TinyLfu cache admission policy (see [1]) being
> used for caffeine "high performance caching library for Java 8".
> It demonstrates high cache hit ratios (see [2]) and enables to build
> high-throughput caches (see caffeine in [3])
> Authors explicitly allow implementations of the algorithm (see [4]).
>
> Does it make sense to evaluate the algorithm for buffer replacement?

I expect synchronization issues with implementation of this algorithm. 
It seems to be hard to avoid some global critical section which can 
cause significant performance degradation at MPP systems (see topic 
"Move PinBuffer and UnpinBuffer to atomics").


> [1]: http://arxiv.org/pdf/1512.00727v1.pdf
> [2]: https://github.com/ben-manes/caffeine/wiki/Efficiency
> [3]: https://github.com/ben-manes/caffeine/wiki/Benchmarks
> [4]: https://github.com/ben-manes/caffeine/issues/23#issuecomment-161536706
>
> Vladimir Sitnikov
>
>

Re: W-TinyLfu for cache eviction

From

Ants Aasma

Date:

09 December 2015, 13:05:22

On Wed, Dec 9, 2015 at 11:31 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> I expect synchronization issues with implementation of this algorithm. It
> seems to be hard to avoid some global critical section which can cause
> significant performance degradation at MPP systems (see topic "Move
> PinBuffer and UnpinBuffer to atomics").

The sketch updates don't really need to be globally consistent,
anything that is faster than cycling of buffers through the LRU tier
would be enough. In principle atomic increments could be used, but
funneling all buffer hits onto a small amount of cachelines and then
doing 4x the amount of writes would result in extremely nasty cache
line ping-ponging. Caffeine implementation appears to solve this by
batching the frequency sketch updates using a striped set of circular
buffers. Re-implementing this is not exactly trivial and some prior
experience with lock free programming seems well advised.

Based on the paper it looks like this algorithm can work with a low
quality primary eviction algorithm. Swapping our GCLOCK out for
something simpler like plain CLOCK could simplify an atomics based
PinBuffer approach. Another interesting avenue for research would be
to use ideas in TinyLFU to implement a tier of "nailed" buffers that
have backend local or striped pin-unpin accounting.

But for checking if the replacement policy implemented by W-TinyLFU is
good it isn't necessary to have a performant locking solution. I think
a global lock would be good enough for a proof of concept that only
evaluates cache hit ratios.

Regards,
Ants Aasma

Re: W-TinyLfu for cache eviction

From

Vladimir Sitnikov

Date:

14 December 2015, 06:48:06

> a global lock would be good enough for a proof of concept that only
evaluates cache hit ratios.

I think emulator can be used to check hit ratios. That way we can see
how different algorithms affect hit ratio.

Is there a set of traces of "buffer load events"? (I did some Google
searches like "postgresql buffer cache trace" with no luck)
Is there an option that enables tracing of each requested buffer Id?

Frankly speaking, I've no access to PG instances with lots of data
(i.e. >10GiB).

> Maybe.  Want to code it up?

That would be interesting, however: I'm not fluent at C. I've never
written multithreaded C code either. I understand what a cache line is
though.
Anyway, before hacking a prototype it makes sense to gather some traces.

Vladimir

Re: W-TinyLfu for cache eviction

From

Amit Kapila

Date:

14 December 2015, 08:34:27

On Mon, Dec 14, 2015 at 12:18 PM, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
> > a global lock would be good enough for a proof of concept that only
> evaluates cache hit ratios.
>
> I think emulator can be used to check hit ratios. That way we can see
> how different algorithms affect hit ratio.
>
> Is there a set of traces of "buffer load events"? (I did some Google
> searches like "postgresql buffer cache trace" with no luck)
> Is there an option that enables tracing of each requested buffer Id?
>

To get the detailed trace for each buffer, I think you can use dynamic

tracing as explained in documentation [1].

Another simple way could be to use:

Explain (Analyze, Buffers) <sql_statement>

This will give blocks hit and read numbers which could be useful in your

experiments.

[1] - http://www.postgresql.org/docs/devel/static/dynamic-trace.html

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com