Re: Global shared meta cache - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Global shared meta cache
Date
Msg-id 7049f52e-22ef-53c4-0be6-96416502469b@postgrespro.ru
Whole thread Raw
In response to RE: Global shared meta cache  ("ideriha.takeshi@fujitsu.com" <ideriha.takeshi@fujitsu.com>)
Responses RE: Global shared meta cache
List pgsql-hackers

On 09.10.2019 9:06, ideriha.takeshi@fujitsu.com wrote:
> Hi, Konstantin
>
>>> From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru]
>>> I do not completely understand from your description when are are going
>>> to evict entry from local cache?
>>> Just once transaction is committed? I think it will be more efficient
>>> to also specify memory threshold for local cache size and use LRU or
>>> some other eviction policy to remove data from local cache.
>>> So if working set (accessed relations) fits in local cache limit, there
>>> will be no performance penalty comparing with current implementation.
>>> There should be completely on difference on pgbench or other benchmarks
>>> with relatively small number of relations.
>>>
>>> If entry is not found in local cache, then we should look for it in
>>> global cache and in case of double cache miss - read it from the disk.
>>> I do not completely understand why we need to store references to
>>> global cache entries in local cache and use reference counters for global cache
>> entries.
>>> Why we can not maintain just two independent caches?
>>>
>>> While there are really databases with hundreds and even thousands of
>>> tables, application is still used to work with only some small subset of them.
>>> So I think that "working set" can still fit in memory.  This is why I
>>> think that in case of local cache miss and global cache hit, we should
>>> copy data from global cache to local cache to make it possible to access it in future
>> without any sycnhronization.
>>> As far as we need to keep all uncommitted data in local cache, there is
>>> still a chance of local memory overflow (if some transaction creates or
>>> alters too much number of tables).
>>> But I think that it is very exotic and rare use case. The problem with
>>> memory overflow usually takes place if we have large number of
>>> backends, each maintaining its own  catalog cache.
>>> So I think that we should have "soft" limit for local cache and "hard"
>>> limit for global cache.
>> Oh, I didn't come up this idea at all. So local cache is sort of 1st cache and global cache
>> is second cache. That sounds great.
>> It would be good for performance and also setting two guc parameter for limiting local
>> cache and global cache gives complete memory control for DBA.
>> Yeah, uncommitted data should be in local but it's the only exception.
>> No need to keep track of reference to global cache from local cache header seems less
>> complex for implementation. I'll look into the design.
> (After sleeping on it)
> What happens if there is a cache miss in local memory and it's found in global?
> One possible way is to copy the found global cache into local memory. If so,
> I'm just anxious about the cost of memcpy. Another way is, for example,
> leaving the global cache and not copying it into local memory. In this case,
> every time searching the global cache seems expensive because we need to
> get lock for at least the partition of hash table.
>
> The architecture that the local cache holding the reference to global cache
> (strictly speaking, holding the pointer to pointer to global cache ) is complex
> but once a process searches global cache, after that it can get global cache by
> checking the reference is still valid and traversing some pointers.
>
> Regards,
> Takeshi Ideriha

If the assumption that working set of backend (set of tables accessed by 
this session) is small enough to fit in backend's memory is true,
then global meta cache is not needed at all: it is enough to limit size 
of local cache and implement some eviction algorithm.
If data is not found in local cache, then it is loaded from catalog in 
standard way.
It is the simplest solution and may be it is good starting point for 
work in this direction.

If there are cases when application need to work with hundreds of tables 
(partitioning?) then we can either store in local cache references to 
global cache either perform two lookups: in local and global caches.





pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: expressive test macros (was: Report test_atomic_ops() failuresconsistently, via macros)
Next
From: Robert Haas
Date:
Subject: Re: Missed check for too-many-children in bgworker spawning