Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Protect syscache from bloating with negative cache entries
Date
Msg-id d3b291ff-d993-78d1-8d28-61bcf72793d6@2ndquadrant.com
Whole thread Raw
In response to Re: Protect syscache from bloating with negative cache entries  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses RE: Protect syscache from bloating with negative cache entries
Re: Protect syscache from bloating with negative cache entries
List pgsql-hackers
On 2/12/19 12:35 PM, Kyotaro HORIGUCHI wrote:
> Thank you for testing and the commits, Tomas.
> 
> At Sat, 9 Feb 2019 19:09:59 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<74386116-0bc5-84f2-e614-0cff19aca2de@2ndquadrant.com>
>> On 2/7/19 1:18 PM, Kyotaro HORIGUCHI wrote:
>>> At Thu, 07 Feb 2019 15:24:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote
in<20190207.152418.139132570.horiguchi.kyotaro@lab.ntt.co.jp>
 
>> I've done a bunch of benchmarks on v13, and I don't see any serious
>> regression either. Each test creates a number of tables (100, 1k, 10k,
>> 100k and 1M) and then runs SELECT queries on them. The tables are
>> accessed randomly - with either uniform or exponential distribution. For
>> each combination there are 5 runs, 60 seconds each (see the attached
>> shell scripts, it should be pretty obvious).
>>
>> I've done the tests on two different machines - small one (i5 with 8GB
>> of RAM) and large one (e5-2620v4 with 64GB RAM), but the behavior is
>> almost exactly the same (with the exception of 1M tables, which does not
>> fit into RAM on the smaller one).
>>
>> On the xeon, the results (throughput compared to master) look like this:
>>
>>
>>     uniform           100     1000    10000   100000   1000000
>>    ------------------------------------------------------------
>>     v13           105.04%  100.28%  102.96%  102.11%   101.54%
>>     v13 (nodata)   97.05%   98.30%   97.42%   96.60%   107.55%
>>
>>
>>     exponential       100     1000    10000   100000   1000000
>>    ------------------------------------------------------------
>>     v13           100.04%  103.48%  101.70%   98.56%   103.20%
>>     v13 (nodata)   97.12%   98.43%   98.86%   98.48%   104.94%
>>
>> The "nodata" case means the tables were empty (so no files created),
>> while in the other case each table contained 1 row.
>>
>> Per the results it's mostly break even, and in some cases there is
>> actually a measurable improvement.
> 
> Great! I guess it comes from reduced size of hash?
> 

Not sure about that. I haven't actually verified that it reduces the
cache size at all - I was measuring the overhead of the extra work. And
I don't think the syscache actually shrunk significantly, because the
throughput was quite high (~15-30k tps, IIRC) so pretty much everything
was touched within the default 600 seconds.

>> That being said, the question is whether the patch actually reduces
>> memory usage in a useful way - that's not something this benchmark
>> validates. I plan to modify the tests to make pgbench script
>> time-dependent (i.e. to pick a subset of tables depending on time).
> 
> Thank you.
> 
>> A couple of things I've happened to notice during a quick review:
>>
>> 1) The sgml docs in 0002 talk about "syscache_memory_target" and
>> "syscache_prune_min_age", but those options were renamed to just
>> "cache_memory_target" and "cache_prune_min_age".
> 
> I'm at a loss how call syscache for users. I think it is "catalog
> cache". The most basic component is called catcache, which is
> covered by the syscache layer, both of then are not revealed to
> users, and it is shown to user as "catalog cache".
> 
> "catalog_cache_prune_min_age", "catalog_cache_memory_target", (if
> exists) "catalog_cache_entry_limit" and
> "catalog_cache_prune_ratio" make sense?
> 

I think "catalog_cache" sounds about right, although my point was simply
that there's a discrepancy between sgml docs and code.

>> 2) "cache_entry_limit" is not mentioned in sgml docs at all, and it's
>> defined three times in guc.c for some reason.
> 
> It is just PoC, added to show how it looks. (The multiple
> instances must bex a result of a convulsion of my fingers..) I
> think this is not useful unless it can be specfied per-relation
> or per-cache basis. I'll remove the GUC and add reloptions for
> the purpose. (But it won't work for pg_class and pg_attribute
> for now).
> 

OK, although I'd just keep it as simple as possible. TBH I can't really
imagine users tuning limits for individual caches in any meaningful way.

>> 3) I don't see why to define PRUNE_BY_AGE and PRUNE_BY_NUMBER, instead
>> of just using two bool variables prune_by_age and prune_by_number doing
>> the same thing.
> 
> Agreed. It's a kind of memory-stingy, which is useless there.
> 
>> 4) I'm not entirely sure about using stmtStartTimestamp. Doesn't that
>> pretty much mean long-running statements will set the lastaccess to very
>> old timestamp? Also, it means that long-running statements (like a PL
>> function accessing a bunch of tables) won't do any eviction at all, no?
>> AFAICS we'll set the timestamp only once, at the very beginning.
>>
>> I wonder whether using some other timestamp source (like a timestamp
>> updated regularly from a timer, or something like that).
> 
> I didin't consider planning that happen within a function. If
> 5min is the default for catalog_cache_prune_min_age, 10% of it
> (30s) seems enough and gettieofday() with such intervals wouldn't
> affect forground jobs. I'd choose catalog_c_p_m_age/10 rather
> than fixed value 30s and 1s as the minimal.
> 

Actually, I see CatCacheCleanupOldEntries contains this comment:

/*
 * Calculate the duration from the time of the last access to the
 * "current" time. Since catcacheclock is not advanced within a
 * transaction, the entries that are accessed within the current
 * transaction won't be pruned.
 */

which I think is pretty much what I've been saying ... But the question
is whether we need to do something about it.

> I obeserved significant degradation by setting up timer at every
> statement start. The patch is doing the followings to get rid of
> the degradation.
> 
> (1) Every statement updates the catcache timestamp as currently
>     does.  (SetCatCacheClock)
> 
> (2) The timestamp is also updated periodically using timer
>    separately from (1). The timer starts if not yet at the time
>    of (1).  (SetCatCacheClock, UpdateCatCacheClock)
> 
> (3) Statement end and transaction end don't stop the timer, to
>    avoid overhead of setting up a timer. (
> 
> (4) But it stops by error. I choosed not to change the thing in
>     PostgresMain that it kills all timers on error.
> 
> (5) Also changing the GUC catalog_cache_prune_min_age kills the
>    timer, in order to reflect the change quickly especially when
>    it is shortened.
> 

Interesting. What was the frequency of the timer / how often was it
executed? Can you share the code somehow?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: Should we still have old release notes in docs?
Next
From: Andrey Lepikhov
Date:
Subject: Re: [PATCH] xlogreader: do not read a file block twice