Over on [1], Heikki mentioned about the usefulness of caching results from parameterized subplans so that they could be used again for subsequent scans which have the same parameters as a previous scan. On [2], I mentioned that parameterized nested loop joins could see similar gains with such a cache. I suggested there that instead of adding code that only allows this to work for subplans, that instead, we add a new node type that can handle the caching for us. We can then just inject that node type in places where it seems beneficial.
Very cool
I've attached a patch which implements this. The new node type is called "Result Cache". I'm not particularly wedded to keeping that name, but if I change it, I only want to do it once. I've got a few other names I mind, but I don't feel strongly or confident enough in them to go and do the renaming.
How the caching works:
First off, it's only good for plugging in on top of parameterized nodes that are rescanned with different parameters. The cache itself uses a hash table using the simplehash.h implementation. The memory consumption is limited to work_mem. The code maintains an LRU list and when we need to add new entries but don't have enough space to do so, we free off older items starting at the top of the LRU list. When we get a cache hit, we move that entry to the end of the LRU list so that it'll be the last to be evicted.
When should we cache:
For nested loop joins, the decision is made purely based on cost.
I thought the main reason to do this was the case when the nested loop subplan was significantly underestimated and we realize during execution that we should have built a hash table. So including this based on cost alone seems to miss a trick.
The patch does rely heavily on good ndistinct estimates.
Exactly. We know we seldom get those with many-way joins.
So +1 for adding this technique. My question is whether it should be added as an optional facility of a parameterised sub plan, rather than an always-needed full-strength node. That way the choice of whether to use it can happen at execution time once we notice that we've been called too many times.