RE: Global shared meta cache - Mailing list pgsql-hackers

From Ideriha, Takeshi
Subject RE: Global shared meta cache
Date
Msg-id 4E72940DA2BF16479384A86D54D0988A7DB56638@G01JPEXMBKW04
Whole thread Raw
In response to RE: Global shared meta cache  ("Ideriha, Takeshi" <ideriha.takeshi@jp.fujitsu.com>)
Responses RE: Global shared meta cache
List pgsql-hackers
>From: Ideriha, Takeshi [mailto:ideriha.takeshi@jp.fujitsu.com]
>[TL; DR]
>The basic idea is following 4 points:
>A. User can choose which database to put a cache (relation and catalog) on shared
>memory and how much memory is used 
>B. Caches of committed data are on the
>shared memory. Caches of uncommitted data are on the local memory.
>C. Caches on the shared memory have xid information (xmin, xmax) 
>D. Evict not recently used cache from shared memory

I updated some thoughts about B and C for CatCache.
I would be very happy if you put some comments.

>[B & C]
>Regarding B & C, the motivation is we don't want other backends to see uncommitted
>tables.
>Search order is local memory -> shared memory -> disk.
>Local process searches cache in shared memory based from its own snapshot and xid
>of cache.
>When cache is not found in shared memory, cache with xmin is made in shared
>memory ( but not in local one).
>
>When cache definition is changed by DDL, new cache is created in local one, and thus
>next commands refer to local cache if needed.
>When it's committed, local cache is cleared and shared cache is updated. This update
>is done by adding xmax to old cache and also make a new one with xmin. The idea
>behind adding a new one is that newly created cache (new table or altered table) is
>likely to be used in next transactions. At this point maybe we can make use of current
>invalidation mechanism, even though invalidation message to other backends is not
>sent.

My current thoughts:
- Each catcache has (maybe partial) HeapTupleHeader
- put every catcache on shared memory and no local catcache
- but catcache for aborted tuple is not put on shared memory
- Hash table exists per kind of CatCache
- These hash tables exists for each database and shared
  - e.g) there is a hash table for pg_class of a DB

Why I'm leaning toward not to use local cache follows:
- At commit moment you need to copy local cache to global cache. This would delay
  the response time.
- Even if uncommitted catcache is on shared memory, other transaction cannot 
  see the cache. In my idea they have xid information and visibility is checked 
  by comparing xmin, xmax of catcache and snapshot.  

OK, then if we put catcache on shared memory, we need to check their visibility.
But if we use the exact same visibility check mechanism as heap tuple,
it takes much more steps compared to current local catcache search.
Current visibility check is based on snapshot check and commit/abort check.
So I'm thinking to only put in-progress caches or committed one. This would
save time for checking catcache status (commit/abort) while searching cache.
But basically I'm going to use current visibility check mechanism except commit/
abort check (in other words check of clog).

These are how it works.
- When creating a catcache, copy heap tuple with heapTupleHeader 
- When update/delete command for catalog tuple is finished, 
  update xmax of corresponding cache 
- If there is a cache whose xmin is aborted xid, delete the cache
- If there is a cache whose xmax is aborted xid, initialize xmax information
- At commit time, there is no action to the shared cache

Pending items are
- thoughts about shared relcache
- "vacuum" process for shared cache

Regards,
Ideriha Takeshi




pgsql-hackers by date:

Previous
From: Michael Meskes
Date:
Subject: Re: SQL statement PREPARE does not work in ECPG
Next
From: Amit Kapila
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs