Re: Subplan result caching - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Subplan result caching
Date
Msg-id CAKU4AWquaTPwTyik=rj7Nm-Fea6D0yyCwQjBQ-WLR_h9cKUf2A@mail.gmail.com
Whole thread Raw
In response to Re: Subplan result caching  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Subplan result caching  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers


On Sun, Apr 26, 2020 at 2:11 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I've been working on a patch to add a little cache to SubPlans, to speed up
> queries with correlated subqueries, where the same subquery is currently
> executed multiple times with the same parameters. The idea is to cache the
> result of the subplan, with the correlation vars as the cache key.

Hi,

This seems like an interesting area to make improvements, so I've
signed up to review the patch.

From looking at the code I see that the caching is being done inside
nodeSubplan.c. I don't think this is the right approach to the
problem. The problem exists for any parameterized path, so I think a
more general approach would be much better.

I'm +1 on this suggestion.  Actually I'm working on the query like this:

select  j1o.i, j2_v.sum_5
from j1 j1o
inner join lateral
    (select im100, sum(im5) as sum_5
     from j2
     where j1o.im100 = im100
     and j1o.i = 1
     group by im100) j2_v
on true
where j1o.i = 1;

I hope the we have a result cache for j2_v.   But this nothing with SubPlan.

If we want to handle this case as well, one of the changes would
be it needs to cache multi records for one input parameter, or return
one row each time but return mutli times for one input parameter,
Tuplestore may be a good option for this case since its full functionalities
like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore,
the next question is how to control the memory usage for this Node. 
We can use the dedicated memory context to know how many memory
this node used in total, but we can't stop the tuplestore from using more
memory.  Or we can force set both current tuplestore->state  to TTS_WRITEFILE
and set the allowedMem to 0 for the following tuplestore, after we find too
memory is used. However this looks a bit of hack. 

As a summary of my thinking about this topic before,  I'd write another
incomplete options to handle this case.  1) we just use the the Material
Path to cache the result for last parameter only.  if the following value is 
same as the last one, we use it. Or else, we rescan the Material path.  
2).  We can consider order the rows by input params key. That will be 
much cache friendly.

BTW,  I see may  node->chgParams was changed without checking if the
datum changed from the previous one, this may lost some optimization 
opportunities (for example during the agg node rescan case, if the params
is not changed, the we use the hash table we build before).  So if we we 
add new parameter to node->chgParams only if the datum really changed, 
will it still be semantic correctly. 

It's great to see someone working on this.

I'd like to have a try. 
 
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Jesse Zhang
Date:
Subject: Fix compilation failure against LLVM 11
Next
From: 曾文旌
Date:
Subject: Re: [Proposal] Global temporary tables