Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Protect syscache from bloating with negative cache entries
Date
Msg-id 0d3806bf-5087-a75f-3592-ee1d508a79db@2ndquadrant.com
Whole thread Raw
In response to Re: Protect syscache from bloating with negative cache entries  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Protect syscache from bloating with negative cache entries
Re: Protect syscache from bloating with negative cache entries
List pgsql-hackers
On 1/21/19 9:56 PM, Bruce Momjian wrote:
> On Fri, Jan 18, 2019 at 05:09:41PM -0800, Andres Freund wrote:
>> Hi,
>>
>> On 2019-01-18 19:57:03 -0500, Robert Haas wrote:
>>> On Fri, Jan 18, 2019 at 4:23 PM andres@anarazel.de <andres@anarazel.de> wrote:
>>>> My proposal for this was to attach a 'generation' to cache entries. Upon
>>>> access cache entries are marked to be of the current
>>>> generation. Whenever existing memory isn't sufficient for further cache
>>>> entries and, on a less frequent schedule, triggered by a timer, the
>>>> cache generation is increased and th new generation's "creation time" is
>>>> measured.  Then generations that are older than a certain threshold are
>>>> purged, and if there are any, the entries of the purged generation are
>>>> removed from the caches using a sequential scan through the cache.
>>>>
>>>> This outline achieves:
>>>> - no additional time measurements in hot code paths
>>>> - no need for a sequential scan of the entire cache when no generations
>>>>   are too old
>>>> - both size and time limits can be implemented reasonably cheaply
>>>> - overhead when feature disabled should be close to zero
>>>
>>> Seems generally reasonable.  The "whenever existing memory isn't
>>> sufficient for further cache entries" part I'm not sure about.
>>> Couldn't that trigger very frequently and prevent necessary cache size
>>> growth?
>>
>> I'm thinking it'd just trigger a new generation, with it's associated
>> "creation" time (which is cheap to acquire in comparison to creating a
>> number of cache entries) . Depending on settings or just code policy we
>> can decide up to which generation to prune the cache, using that
>> creation time.  I'd imagine that we'd have some default cache-pruning
>> time in the minutes, and for workloads where relevant one can make
>> sizing configurations more aggressive - or something like that.
> 
> OK, so it seems everyone likes the idea of a timer.  The open questions
> are whether we want multiple epochs, and whether we want some kind of
> size trigger.
> 

FWIW I share the with that time-based eviction (be it some sort of
timestamp or epoch) seems promising, seems cheaper than pretty much any
other LRU metric (requiring usage count / clock sweep / ...).

> With only one time epoch, if the timer is 10 minutes, you could expire an
> entry after 10-19 minutes, while with a new epoch every minute and
> 10-minute expire, you can do 10-11 minute precision.  I am not sure the
> complexity is worth it.
> 

I don't think having just a single epoch would be significantly less
complex than having more of them. In fact, having more of them might
make it actually cheaper.


> For a size trigger, should removal be effected by how many expired cache
> entries there are?  If there were 10k expired entries or 50, wouldn't
> you want them removed if they have not been accessed in X minutes?
> 
> In the worst case, if 10k entries were accessed in a query and never
> accessed again, what would the ideal cleanup behavior be?  Would it
> matter if it was expired in 10 or 19 minutes?  Would it matter if there
> were only 50 entries?
> 

I don't think we need to remove the expired entries right away, if there
are only very few of them. The cleanup requires walking the hash table,
which means significant fixed cost. So if there are only few expired
entries (say, less than 25% of the cache), we can just leave them around
and clean them if we happen to stumble on them (although that may not be
possible with dynahash, which has no concept of expiration) of before
enlarging the hash table.

FWIW when it comes to memory consumption, it's important to realize the
cache memory context won't release the memory to the system, even if we
remove the expired entries. It'll simply stash them into a freelist.
That's OK when the entries are to be reused, but the memory usage won't
decrease after a sudden spike for example (and there may be other chunks
allocated on the same page, so paging it out will hurt).

So if we want to address this case too (and we probably want), we may
need to discard the old cache memory context someho (e.g. rebuild the
cache in a new one, and copy the non-expired entries). Which is a nice
opportunity to do the "full" cleanup, of course.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Macros bundling RELKIND_* conditions
Next
From: Justin Pryzby
Date:
Subject: Re: pg11.1: dsa_area could not attach to segment