Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Protect syscache from bloating with negative cache entries |
Date | |
Msg-id | 74386116-0bc5-84f2-e614-0cff19aca2de@2ndquadrant.com Whole thread Raw |
In response to | Re: Protect syscache from bloating with negative cache entries (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: Protect syscache from bloating with negative cache entries
|
List | pgsql-hackers |
On 2/7/19 1:18 PM, Kyotaro HORIGUCHI wrote: > At Thu, 07 Feb 2019 15:24:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in<20190207.152418.139132570.horiguchi.kyotaro@lab.ntt.co.jp> >> I'm going to retake numbers with search-only queries. > > Yeah, I was stupid. > > I made a rerun of benchmark using "-S -T 30" on the server build > with no assertion and -O2. The numbers are the best of three > successive attempts. The patched version is running with > cache_target_memory = 0, cache_prune_min_age = 600 and > cache_entry_limit = 0 but pruning doesn't happen by the workload. > > master: 13393 tps > v12 : 12625 tps (-6%) > > Significant degradation is found. > > Recuded frequency of dlist_move_tail by taking 1ms interval > between two succesive updates on the same entry let the > degradation dissapear. > > patched : 13720 tps (+2%) > > I think there's still no need of such frequency. It is 100ms in > the attched patch. > > # I'm not sure the name LRU_IGNORANCE_INTERVAL makes sens.. > Hi, I've done a bunch of benchmarks on v13, and I don't see any serious regression either. Each test creates a number of tables (100, 1k, 10k, 100k and 1M) and then runs SELECT queries on them. The tables are accessed randomly - with either uniform or exponential distribution. For each combination there are 5 runs, 60 seconds each (see the attached shell scripts, it should be pretty obvious). I've done the tests on two different machines - small one (i5 with 8GB of RAM) and large one (e5-2620v4 with 64GB RAM), but the behavior is almost exactly the same (with the exception of 1M tables, which does not fit into RAM on the smaller one). On the xeon, the results (throughput compared to master) look like this: uniform 100 1000 10000 100000 1000000 ------------------------------------------------------------ v13 105.04% 100.28% 102.96% 102.11% 101.54% v13 (nodata) 97.05% 98.30% 97.42% 96.60% 107.55% exponential 100 1000 10000 100000 1000000 ------------------------------------------------------------ v13 100.04% 103.48% 101.70% 98.56% 103.20% v13 (nodata) 97.12% 98.43% 98.86% 98.48% 104.94% The "nodata" case means the tables were empty (so no files created), while in the other case each table contained 1 row. Per the results it's mostly break even, and in some cases there is actually a measurable improvement. That being said, the question is whether the patch actually reduces memory usage in a useful way - that's not something this benchmark validates. I plan to modify the tests to make pgbench script time-dependent (i.e. to pick a subset of tables depending on time). A couple of things I've happened to notice during a quick review: 1) The sgml docs in 0002 talk about "syscache_memory_target" and "syscache_prune_min_age", but those options were renamed to just "cache_memory_target" and "cache_prune_min_age". 2) "cache_entry_limit" is not mentioned in sgml docs at all, and it's defined three times in guc.c for some reason. 3) I don't see why to define PRUNE_BY_AGE and PRUNE_BY_NUMBER, instead of just using two bool variables prune_by_age and prune_by_number doing the same thing. 4) I'm not entirely sure about using stmtStartTimestamp. Doesn't that pretty much mean long-running statements will set the lastaccess to very old timestamp? Also, it means that long-running statements (like a PL function accessing a bunch of tables) won't do any eviction at all, no? AFAICS we'll set the timestamp only once, at the very beginning. I wonder whether using some other timestamp source (like a timestamp updated regularly from a timer, or something like that). 5) There are two fread() calls in 0003 triggering a compiler warning about unused return value. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: