Home > mailing lists

Re: Subplan result caching - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Subplan result caching
Date	April 26, 2020 10:07:00
Msg-id	CAKU4AWquaTPwTyik=rj7Nm-Fea6D0yyCwQjBQ-WLR_h9cKUf2A@mail.gmail.com Whole thread Raw
In response to	Re: Subplan result caching (David Rowley <david.rowley@2ndquadrant.com>)
Responses	Re: Subplan result caching
List	pgsql-hackers

Tree view

On Sun, Apr 26, 2020 at 2:11 PM David Rowley <david.rowley@2ndquadrant.com> wrote:

On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I've been working on a patch to add a little cache to SubPlans, to speed up
> queries with correlated subqueries, where the same subquery is currently
> executed multiple times with the same parameters. The idea is to cache the
> result of the subplan, with the correlation vars as the cache key.

Hi,

This seems like an interesting area to make improvements, so I've
signed up to review the patch.

From looking at the code I see that the caching is being done inside
nodeSubplan.c. I don't think this is the right approach to the
problem. The problem exists for any parameterized path, so I think a
more general approach would be much better.

I'm +1 on this suggestion. Actually I'm working on the query like this:

select j1o.i, j2_v.sum_5
from j1 j1o
inner join lateral
(select im100, sum(im5) as sum_5
from j2
where j1o.im100 = im100
and j1o.i = 1
group by im100) j2_v
on true
where j1o.i = 1;

I hope the we have a result cache for j2_v. But this nothing with SubPlan.

If we want to handle this case as well, one of the changes would

be it needs to cache multi records for one input parameter, or return

one row each time but return mutli times for one input parameter,

Tuplestore may be a good option for this case since its full functionalities

like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore,

the next question is how to control the memory usage for this Node.

We can use the dedicated memory context to know how many memory

this node used in total, but we can't stop the tuplestore from using more

memory. Or we can force set both current tuplestore->state to TTS_WRITEFILE

and set the allowedMem to 0 for the following tuplestore, after we find too

memory is used. However this looks a bit of hack.

As a summary of my thinking about this topic before, I'd write another

incomplete options to handle this case. 1) we just use the the Material

Path to cache the result for last parameter only. if the following value is

same as the last one, we use it. Or else, we rescan the Material path.

2). We can consider order the rows by input params key. That will be

much cache friendly.

BTW, I see may node->chgParams was changed without checking if the

datum changed from the previous one, this may lost some optimization

opportunities (for example during the agg node rescan case, if the params

is not changed, the we use the hash table we build before). So if we we

add new parameter to node->chgParams only if the datum really changed,

will it still be semantic correctly.

It's great to see someone working on this.

I'd like to have a try.

Best Regards

Andy Fan

pgsql-hackers by date:

From: Jesse Zhang
Date: 26 April 2020, 07:41:20
Subject: Fix compilation failure against LLVM 11

From: 曾文旌
Date: 26 April 2020, 11:09:02
Subject: Re: [Proposal] Global temporary tables

Re: Subplan result caching - Mailing list pgsql-hackers

Previous

Next