On Sun, 26 Apr 2020 at 19:08, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> If we want to handle this case as well, one of the changes would
> be it needs to cache multi records for one input parameter, or return
> one row each time but return mutli times for one input parameter,
> Tuplestore may be a good option for this case since its full functionalities
> like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore,
> the next question is how to control the memory usage for this Node.
> We can use the dedicated memory context to know how many memory
> this node used in total, but we can't stop the tuplestore from using more
> memory. Or we can force set both current tuplestore->state to TTS_WRITEFILE
> and set the allowedMem to 0 for the following tuplestore, after we find too
> memory is used. However this looks a bit of hack.
I didn't imagine a tuplestore would be that useful for this. A node
like this will do its best work when the ratio of n_values /
distinct_values of the parameters is high. The planner can often not
be that great at knowing the number of distinct values, especially so
when there is more than one expression to estimate the number of
distinct values for. (we added extended statistics to try to help with
that). I think this node will do its best when the time spent for a
cache miss it bearly any more expensive than scanning the subnode to
get the results. If we can do that then we'll see fewer regressions
for when we inject one of these nodes where it'll do no good, e.g when
we'll never get a repeated value. If we start spilling these tuples
out to disk then it adds overhead which might never pay off.
I'd suggest a hash table to act as an MRU cache. We'd just evict old
values when we run out of space, i.e consume all of work_mem.
I've got a bunch of code locally which is still a work in progress to
do this. I'll finish it off and post it here.
David