Sketch for nonunique searches in syscaches - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Sketch for nonunique searches in syscaches |
Date | |
Msg-id | 19534.1017972803@sss.pgh.pa.us Whole thread Raw |
List | pgsql-hackers |
I've been thinking about how to avoid performance degradation in function and operator lookup due to the addition of namespaces. Probing the syscaches individually for each namespace on the search path seems like a loser, mainly because a separate indexscan is required to load each cache entry; even though repeated searches might be fairly fast, that first one is going to be a killer. It occurs to me though that with some improvement in the syscache facility, we could keep the speed similar or even make it faster. The idea is to make the syscaches able to cache the result of searches for partial keys; for example, a syscache whose complete key is <proname, pronargs, proargtypes, pronamespace> could be asked for a list of all tuples matching a particular <proname, pronargs> pair. This could actually *eliminate* indexscans that occur now --- for example, the search needed to resolve an ambiguous function call could work on the result list of such a cache lookup, meaning that repeated searches would use syscache entries instead of having to do an indexscan every time. Here are my thoughts about how to do this. Any comments or better ideas? API: the call SearchSysCacheList is similar to SearchSysCache but has an additional parameter saying how many key columns are actually being specified. It returns a struct that includes a List of HeapTuple pointers. Each such HeapTuple is a regular syscache entry. The call increments the reference count of each entry in the list, so that they won't disappear while being examined. The caller must call ReleaseSysCacheList to decrement all these reference counts again when done examining the list. Internals: each syscache will have a list of structs describing currently cached list-searches; this will include the search keys and a list of member tuples, all of which are in the syscache. The member tuples will also have back-pointers to the list. Since we allow up to four key columns for a syscache, at most three such back-pointers could be needed in each cache entry: a tuple having key <a,b,c,d> could only be a member of lists for partial search keys <a>, <a,b>, <a,b,c>. In practice I think it'll be easier to have only one back-link, allowing a given cache row to be a member of at most one list; if the same tuple is ever searched for using different numbers of key columns (which seems unlikely anyway) then we'd end up making multiple cache entries for it. If no existing list entry matches a SearchSysCacheList request, the required data can be fetched in a single indexscan operation, so this should be only marginally more expensive than loading a single cache entry using plain SearchSysCache. All lists in a syscache will be invalidated whenever we receive notification of *any* update to the underlying table; this seems a lot easier than trying to detect whether a given update has caused any addition or deletion in a cached list. (Note that this wouldn't work at all if we hadn't recently modified the cache inval logic to notify about both insertions and updates/deletions in system catalogs. But we did.) The list structs will have reference counts just like individual cache elements, so that we don't delete a list that's currently in use by some caller. However, we will not provide LRU aging logic for lists; rather, a list will drop out of cache whenever any one of its member tuples does. This should work pretty well for searches in system catalogs that don't change much; pg_proc and pg_operator probably don't change much in most databases. I am less sure whether it will be helpful for searches in pg_class, which changes rather often if you do any temp table operations. Might be better to stick with retail probes into the cache for pg_class entries. Any comments? regards, tom lane
pgsql-hackers by date: