Home > mailing lists

Re: Subplan result caching - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Subplan result caching
Date	April 26, 2020 15:35:31
Msg-id	CAKU4AWq3rARimEGKnVfyp6BgcBT4S1gTgwtie_JHihSSZdrnUQ@mail.gmail.com Whole thread Raw
In response to	Re: Subplan result caching (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Subplan result caching (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-hackers

Tree view

On Sun, Apr 26, 2020 at 5:49 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, 26 Apr 2020 at 19:08, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> If we want to handle this case as well, one of the changes would
> be it needs to cache multi records for one input parameter, or return
> one row each time but return mutli times for one input parameter,
> Tuplestore may be a good option for this case since its full functionalities
> like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore,
> the next question is how to control the memory usage for this Node.
> We can use the dedicated memory context to know how many memory
> this node used in total, but we can't stop the tuplestore from using more
> memory. Or we can force set both current tuplestore->state to TTS_WRITEFILE
> and set the allowedMem to 0 for the following tuplestore, after we find too
> memory is used. However this looks a bit of hack.

I didn't imagine a tuplestore would be that useful for this. A node
like this will do its best work when the ratio of n_values /
distinct_values of the parameters is high. The planner can often not
be that great at knowing the number of distinct values, especially so
when there is more than one expression to estimate the number of
distinct values for. (we added extended statistics to try to help with
that). I think this node will do its best when the time spent for a
cache miss it bearly any more expensive than scanning the subnode to
get the results. If we can do that then we'll see fewer regressions
for when we inject one of these nodes where it'll do no good, e.g when
we'll never get a repeated value. If we start spilling these tuples
out to disk then it adds overhead which might never pay off.

I'd suggest a hash table to act as an MRU cache. We'd just evict old
values when we run out of space, i.e consume all of work_mem.

I've got a bunch of code locally which is still a work in progress to
do this. I'll finish it off and post it here.

I was feeling that we may have to maintain some extra status if we use hash

table rather than tuple store, but that might be not a major concern. I can

wait and see your patch.

Best Regards

Andy Fan

pgsql-hackers by date:

From: Peter Eisentraut
Date: 26 April 2020, 15:01:04
Subject: Re: Setting min/max TLS protocol in clientside libpq

From: Justin Pryzby
Date: 26 April 2020, 19:13:24
Subject: Re: doc review for v13

Re: Subplan result caching - Mailing list pgsql-hackers

Previous

Next