Syscaches should store negative entries, too - Mailing list pgsql-hackers

From Tom Lane
Subject Syscaches should store negative entries, too
Date
Msg-id 19585.1012350724@sss.pgh.pa.us
Whole thread Raw
Responses Re: Syscaches should store negative entries, too
List pgsql-hackers
While fooling around some more with profiling of simple queries,
I noticed that in some scenarios there are lots of failing lookups in
the syscaches.  This happens, for example, when trying to resolve
ambiguous operators or functions: the parser will do a lot of
can_coerce_type calls, each of which probes using the syscache to see
if there is a matching type-coercion function.  Often there won't be.

Now, while the syscaches are very good at dealing with repetitive
successful lookups, they suck on repetitive unsuccessful ones: each
probe will do a whole new indexscan search on the underlying catalog.
It's not difficult to get into situations where the bulk of the catalog
searches are due to repetitive failing syscache probes.  For example,
I added some simple counters to current sources, and got these results
for a 100-transaction run of pgbench:

Catcache totals: 43 tup, 14519 srch, 13976 hits, 43 loads, 500 not found

That says the caches had 43 tuples by the end of the run, and were
searched 14519 times, of which 13976 searches were satisfied by an
existing entry and another 43 by a newly-loaded entry.  The remaining
500 searches failed.  The catcaches were therefore responsible for
543 catalog indexscans.  The 500 failed searches almost certainly
consisted of 100 sets of probes for the same five nonexistent tuples,
given the repetitive structure of the queries.  As we throw more
similar queries at the backend, no new cache entries will need to be
loaded ... but every failing search will have to be done again.

I suspect repetitive query structure is a common feature of most
applications, so these numbers suggest strongly that the catcaches
ought to cache negative results as well as positive ones.

AFAICS there's no logical difficulty in doing this: we simply make
a catcache entry that contains the probed-for key values but is
marked "no one home at this address".  If a probe hits one of these
things, it can return NULL without a trip to the catalog.  If someone
later comes along and creates a tuple that matches the key value,
the negative-result cache entry will be invalidated in the usual way
(this should work because insertion and update are treated identically
in the caches).

Negative and positive cache entries should compete on an equal basis for
space in the cache, since they are equally expensive to recreate.

Can anyone spot a flaw in this reasoning?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: A simpler way to configure the source code?
Next
From: Peter Eisentraut
Date:
Subject: Re: Per-database and per-user GUC settings