Re: Subplan result caching - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Subplan result caching
Date
Msg-id CAKU4AWq3rARimEGKnVfyp6BgcBT4S1gTgwtie_JHihSSZdrnUQ@mail.gmail.com
Whole thread Raw
In response to Re: Subplan result caching  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Subplan result caching  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers


On Sun, Apr 26, 2020 at 5:49 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, 26 Apr 2020 at 19:08, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> If we want to handle this case as well, one of the changes would
> be it needs to cache multi records for one input parameter, or return
> one row each time but return mutli times for one input parameter,
> Tuplestore may be a good option for this case since its full functionalities
> like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore,
> the next question is how to control the memory usage for this Node.
> We can use the dedicated memory context to know how many memory
> this node used in total, but we can't stop the tuplestore from using more
> memory.  Or we can force set both current tuplestore->state  to TTS_WRITEFILE
> and set the allowedMem to 0 for the following tuplestore, after we find too
> memory is used. However this looks a bit of hack.

I didn't imagine a tuplestore would be that useful for this. A node
like this will do its best work when the ratio of n_values /
distinct_values of the parameters is high. The planner can often not
be that great at knowing the number of distinct values, especially so
when there is more than one expression to estimate the number of
distinct values for. (we added extended statistics to try to help with
that).  I think this node will do its best when the time spent for a
cache miss it bearly any more expensive than scanning the subnode to
get the results.  If we can do that then we'll see fewer regressions
for when we inject one of these nodes where it'll do no good, e.g when
we'll never get a repeated value.  If we start spilling these tuples
out to disk then it adds overhead which might never pay off.

I'd suggest a hash table to act as an MRU cache.  We'd just evict old
values when we run out of space, i.e consume all of work_mem.

I've got a bunch of code locally which is still a work in progress to
do this. I'll finish it off and post it here.

I was feeling that we may have to maintain some extra status if we use hash
table rather than tuple store, but that might be not a major concern.  I can
wait and see your patch. 

Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Setting min/max TLS protocol in clientside libpq
Next
From: Justin Pryzby
Date:
Subject: Re: doc review for v13