Re: SearchCatCacheList()/SearchSysCacheList() is O(n) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: SearchCatCacheList()/SearchSysCacheList() is O(n) |
Date | |
Msg-id | 20210512213743.fbvwt23vl3qqokmw@alap3.anarazel.de Whole thread Raw |
In response to | Re: SearchCatCacheList()/SearchSysCacheList() is O(n) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: SearchCatCacheList()/SearchSysCacheList() is O(n)
|
List | pgsql-hackers |
Hi, On 2021-05-12 17:26:28 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > The problem is that SearchCatCacheList() is not actually a hash table - > > there are no buckets, in contrast to SearchCatCacheList(). > > Uh, what did you mean to compare to there? Oops, copy-and-paste failure. I was trying to reference SearchCatCacheInternal(). > > Tom, any chance you remember if this was an oversight, or whether you > > just considered this to be OK, given the likely numbers of objects? > > I'm pretty sure I didn't think the lists would get large enough to be > a problem. I'm not quite sure which list is a problem here, actually, > seeing that the functions all have distinct names. It's not an individual "result" list that's the issue. In my example they're all exactly one element long. The problem is that CatCache->list has one element for each cached SearchCatCacheList() result, and that for every SearchCatCacheList() we linearly search through CatCache->list to find a match. This is the profile: - 88.21% postgres postgres [.] SearchCatCacheList - 88.21% SearchCatCacheList - 88.21% SearchSysCacheList FuncnameGetCandidates func_get_detail ParseFuncOrColumn transformFuncCall + 0.65% postgres postgres [.] AllocSetAlloc IOW, a single SearchCatCacheInternal() is O(N). With a fairly small constant, but obviously that's still not great for a cache. > I have a vague recollection that some callers depend on the lists > being ordered by key. I'm also fairly sure that most callers need > to look at all entries of whichever list they've requested. So > I have doubts that "use a hash table" is really going to be a > productive idea. I was only thinking of a hashtable to identify the relvant CatCList, not within a CatCList or anything. The cache search code between SearchCatCacheInternal() and SearchCatCacheList() is pretty similar, but crucially the former uses buckets, the latter doesn't: static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3, Datum v4) ... /* * find the hash bucket in which to look for the tuple */ hashValue = CatalogCacheComputeHashValue(cache, nkeys, v1, v2, v3, v4); hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets); /* * scan the hash bucket until we find a match or exhaust our tuples * * Note: it's okay to use dlist_foreach here, even though we modify the * dlist within the loop, because we don't continue the loop afterwards. */ bucket = &cache->cc_bucket[hashIndex]; dlist_foreach(iter, bucket) { ... SearchCatCacheList(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3) ... /* * compute a hash value of the given keys for faster search. We don't * presently divide the CatCList items into buckets, but this still lets * us skip non-matching items quickly most of the time. */ lHashValue = CatalogCacheComputeHashValue(cache, nkeys, v1, v2, v3, v4); /* * scan the items until we find a match or exhaust our list * * Note: it's okay to use dlist_foreach here, even though we modify the * dlist within the loop, because we don't continue the loop afterwards. */ dlist_foreach(iter, &cache->cc_lists) { Greetings, Andres Freund
pgsql-hackers by date: