RE: Global shared meta cache - Mailing list pgsql-hackers

From ideriha.takeshi@fujitsu.com
Subject RE: Global shared meta cache
Date
Msg-id OSAPR01MB1985C6A4FA02294DDF7A1056EA790@OSAPR01MB1985.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Global shared meta cache  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Global shared meta cache  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
>From: Konstantin Knizhnik [mailto:k.knizhnik@postgrespro.ru]
>If the assumption that working set of backend (set of tables accessed by this session)
>is small enough to fit in backend's memory is true, then global meta cache is not
>needed at all: it is enough to limit size of local cache and implement some eviction
>algorithm.
>If data is not found in local cache, then it is loaded from catalog in standard way.
>It is the simplest solution and may be it is good starting point for work in this direction.

Thank you for the reply. 

I introduced GUC for users to choose if they want to use this feature or not.
But as you stated, if data size is not so much big, my suggestion does too much and simple threshold is enough.
The idea of threashold has been discussed in another thread, so I'd like to discuss it in that thread.
Though it's not active these days, ideas having been discussed are memory limit, access time limit, and hybrid.
It seems to me that discussion is converged into the idea of eviction by access timestamp. 
https://www.postgresql.org/message-id/flat/20161219.201505.11562604.horiguchi.kyotaro@lab.ntt.co.jp  

>If there are cases when application need to work with hundreds of tables
>(partitioning?) then we can either store in local cache references to global cache either
>perform two lookups: in local and global caches.

I think this is my target. In case of especially many partitioned table, 
and many (more than 100) columns  and so many backends, sharing 
cache would have more benefits for memory usage and performance 
than having only simple threshold.

I did experiment before. One parent table has about 145 interger columns and 
this table is partitioned into about 350 child tables in average.
There is 11 parent tables and about 3850 tables in total.
When I did "select * from parent_table" to 11 parent tables, only CacheMemoryContext
consumed about 0.37GB and in case of 100 backends, it consumed about 37GB.
This is because he number of system catalog cache for pg_statistics is very large
(about 577,000 entries). This number is almost same as the number of columns (145) times
the number of tables (3850). (Sorry that the model and figures are not simple to understand.)

By the way, in my current patch there are some redundant codes.
For example, LWLocks are used too much even if you can actually use spin locks.
Another thing is increasing/decreasing reference count of local reference even if 
local reference cache doesn't need to be protected.
I'll fix these things and submit statistics about memory usage and performance.

Regards,
Takeshi Ideriha

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Keep compiler silence (clang 10, implicit conversion from 'long'to 'double' )
Next
From: Chapman Flack
Date:
Subject: Re: Should we make scary sounding, but actually routine, errors lessscary?