From: Tomas Vondra
> I don't think we need to remove the expired entries right away, if
there
> are only very few of them. The cleanup requires walking the hash
table,
> which means significant fixed cost. So if there are only few expired
> entries (say, less than 25% of the cache), we can just leave them
around
> and clean them if we happen to stumble on them (although that may
not be
> possible with dynahash, which has no concept of expiration) of
before
> enlarging the hash table.
I agree in that we don't need to evict cache entries as long as the
memory permits (within the control of the DBA.)
But how does the concept of expiration fit the catcache? How would
the user determine the expiration time, i.e. setting of
syscache_prune_min_age? If you set a small value to evict unnecessary
entries faster, necessary entries will also be evicted. Some access
counter would keep accessed entries longer, but some idle time (e.g.
lunch break) can flush entries that you want to access after the lunch
break.
The idea of expiration applies to the case where we want possibly
stale entries to vanish and load newer data upon the next access. For
example, the TTL (time-to-live) of Memcached, Redis, DNS, ARP. Is the
catcache based on the same idea with them? No.
What we want to do is to evict never or infrequently used cache
entries. That's naturally the task of LRU, isn't it? Even the high
performance Memcached and Redis uses LRU when the cache is full. As
Bruce said, we don't have to be worried about the lock contention or
something, because we're talking about the backend local cache. Are
we worried about the overhead of manipulating the LRU chain? The
current catcache already does it on every access; it calls
dlist_move_head() to put the accessed entry to the front of the hash
bucket.
> So if we want to address this case too (and we probably want), we
may
> need to discard the old cache memory context someho (e.g. rebuild
the
> cache in a new one, and copy the non-expired entries). Which is a
nice
> opportunity to do the "full" cleanup, of course.
The straightforward, natural, and familiar way is to limit the cache
size, which I mentioned in some previous mail. We should give the DBA
the ability to control memory usage, rather than considering what to
do after leaving the memory area grow unnecessarily too large. That's
what a typical "cache" is, isn't it?
https://en.wikipedia.org/wiki/Cache_(computing)
"To be cost-effective and to enable efficient use of data, caches must
be relatively small."
Another relevant suboptimal idea would be to provide each catcache
with a separate memory context, which is the child of
CacheMemoryContext. This gives slight optimization by using the slab
context (slab.c) for a catcache with fixed-sized tuples. But that'd
be a bit complex, I'm afraid for PG 12.
Regards
MauMau