Sketch for nonunique searches in syscaches - Mailing list pgsql-hackers

From Tom Lane
Subject Sketch for nonunique searches in syscaches
Date
Msg-id 19534.1017972803@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
I've been thinking about how to avoid performance degradation in
function and operator lookup due to the addition of namespaces.
Probing the syscaches individually for each namespace on the
search path seems like a loser, mainly because a separate indexscan
is required to load each cache entry; even though repeated searches
might be fairly fast, that first one is going to be a killer.

It occurs to me though that with some improvement in the syscache
facility, we could keep the speed similar or even make it faster.
The idea is to make the syscaches able to cache the result of searches
for partial keys; for example, a syscache whose complete key is
<proname, pronargs, proargtypes, pronamespace> could be asked for a
list of all tuples matching a particular <proname, pronargs> pair.
This could actually *eliminate* indexscans that occur now --- for
example, the search needed to resolve an ambiguous function call
could work on the result list of such a cache lookup, meaning that
repeated searches would use syscache entries instead of having to
do an indexscan every time.

Here are my thoughts about how to do this.  Any comments or better
ideas?

API: the call SearchSysCacheList is similar to SearchSysCache but has
an additional parameter saying how many key columns are actually being
specified.  It returns a struct that includes a List of HeapTuple
pointers.  Each such HeapTuple is a regular syscache entry.  The call
increments the reference count of each entry in the list, so that they
won't disappear while being examined.  The caller must call
ReleaseSysCacheList to decrement all these reference counts again when
done examining the list.

Internals: each syscache will have a list of structs describing
currently cached list-searches; this will include the search keys and
a list of member tuples, all of which are in the syscache.  The member
tuples will also have back-pointers to the list.  Since we allow up to
four key columns for a syscache, at most three such back-pointers could
be needed in each cache entry: a tuple having key <a,b,c,d> could only
be a member of lists for partial search keys <a>, <a,b>, <a,b,c>.
In practice I think it'll be easier to have only one back-link, allowing
a given cache row to be a member of at most one list; if the same tuple
is ever searched for using different numbers of key columns (which seems
unlikely anyway) then we'd end up making multiple cache entries for it.

If no existing list entry matches a SearchSysCacheList request, the
required data can be fetched in a single indexscan operation, so this
should be only marginally more expensive than loading a single cache
entry using plain SearchSysCache.

All lists in a syscache will be invalidated whenever we receive
notification of *any* update to the underlying table; this seems a lot
easier than trying to detect whether a given update has caused any
addition or deletion in a cached list.  (Note that this wouldn't work at
all if we hadn't recently modified the cache inval logic to notify about
both insertions and updates/deletions in system catalogs.  But we did.)
The list structs will have reference counts just like individual cache
elements, so that we don't delete a list that's currently in use by some
caller.  However, we will not provide LRU aging logic for lists; rather,
a list will drop out of cache whenever any one of its member tuples
does.

This should work pretty well for searches in system catalogs that don't
change much; pg_proc and pg_operator probably don't change much in most
databases.  I am less sure whether it will be helpful for searches in
pg_class, which changes rather often if you do any temp table operations.
Might be better to stick with retail probes into the cache for pg_class
entries.

Any comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hiroshi Inoue
Date:
Subject: Re: What's the CURRENT schema ?
Next
From: Tom Lane
Date:
Subject: Re: What's the CURRENT schema ?