Thread: Subplan result caching
Hi, I've been working on a patch to add a little cache to SubPlans, to speed up queries with correlated subqueries, where the same subquery is currently executed multiple times with the same parameters. The idea is to cache the result of the subplan, with the correlation vars as the cache key. That helps a lot, if you happen to have that kind of a query. I bumped into this while looking at TPC-DS query 6: select a.ca_state state, count(*) cnt from customer_address a ,customer c ,store_sales s ,date_dim d ,item i where a.ca_address_sk = c.c_current_addr_sk and c.c_customer_sk = s.ss_customer_sk and s.ss_sold_date_sk = d.d_date_sk and s.ss_item_sk = i.i_item_sk and d.d_month_seq = (select distinct (d_month_seq) from date_dim where d_year = 2000 and d_moy = 5 ) and i.i_current_price > 1.2 * (select avg(j.i_current_price) from item j where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt The first subquery is uncorrelated, and is already handled efficiently as an InitPlan. This patch helps with the second subquery. There are only 11 different categories, but we currently re-execute it for every row of the outer query, over 26000 times. (I think I have about 1 GB of data in my little database I've been testing with, I'm not sure how this would scale with the amount of data.) With this patch, it's only executed 11 times, the cache avoids the rest of the executions. That brings the runtime, on my laptop, from about 30 s to 120 ms. For this particular query, I actually wish we could pull up that subquery, instead. I did some investigation into that last summer, https://www.postgresql.org/message-id/67e353e8-c20e-7980-a282-538779edf25b%40iki.fi, but that's a much bigger project. In any case, even if the planner was able to pull up subqueries in more cases, a cache like this would still be helpful for those cases where pulling up was still not possible. Thoughts? - Heikki
Attachment
Heikki Linnakangas wrote: > I've been working on a patch to add a little cache to SubPlans, to speed > up queries with correlated subqueries, where the same subquery is > currently executed multiple times with the same parameters. The idea is > to cache the result of the subplan, with the correlation vars as the > cache key. I like the idea. I think the cache should be limited in size, perhaps by work_mem. Also, there probably should be a GUC for this, defaulting to "off". Yours, Laurenz Albe
>I like the idea.
>Also, there probably should be a GUC for this, defaulting to "off".
+1
>Also, there probably should be a GUC for this, defaulting to "off".
I think the feature could be on by default provided it can properly identify "volatile" functions/tables hidden behind views.
Vladimir
On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I've been working on a patch to add a little cache to SubPlans, to speed up > queries with correlated subqueries, where the same subquery is currently > executed multiple times with the same parameters. The idea is to cache the > result of the subplan, with the correlation vars as the cache key. ... > Thoughts? G'day Sir, I'm in favour of making improvements here. I had a think about this and just glanced at the patch to check if you'd done it the way I'd thought.. I'd thought this might be done with some sort of "LazyMaterialize" node that could take params and store multiple rows per param set. That node type could contain all the LRU logic to get rid of the lesser used items when work_mem begin to fill. If you did it this way then nodeSubplan.c would need to know nothing about this. The planner would simply just inject one of these when it thinks some caching would be wise, similar to how it does with Materialize. LazyMaterialize would simply check the cache and return those rows, if they exist, otherwise consult its only subplan to get the rows and then cache them. If you did it this way, as a followup we could go plug it into parameterised nested loops to speed up repeated lookups of the inner side plan. The gains there are probably similar to what you've mentioned. What do you think? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Heikki Linnakangas <hlinnaka@iki.fi> writes: > I've been working on a patch to add a little cache to SubPlans, to speed > up queries with correlated subqueries, where the same subquery is > currently executed multiple times with the same parameters. The idea is > to cache the result of the subplan, with the correlation vars as the > cache key. I find this pretty bogus as it stands, because it assumes without proof that the subquery will deliver identical results for any two parameter values that are considered equal by the datatype's default equality operator. An easy counterexample is a subquery whose result depends on the text conversion of a float8 parameter: zero and minus zero have different text forms, but are equal according to float8eq. To make this patch safe, I think you'd need to grovel through the subquery and make sure that the parameters are only used as inputs to operators that belong to the type's default btree or hash opfamily. (Many other cases would work in practice, but we have no semantic knowledge that would let us be sure of that.) That's doable no doubt, but I wonder whether that leaves you in a place that's any better than the plan-time-decorrelation approach you proposed in the earlier thread. I liked that better TBH; this one seems like a very ad-hoc reinvention of a hash join. I don't especially like the unpredictable number of executions of the subquery that it results in, either. regards, tom lane
On 23/05/18 19:25, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> I've been working on a patch to add a little cache to SubPlans, to speed >> up queries with correlated subqueries, where the same subquery is >> currently executed multiple times with the same parameters. The idea is >> to cache the result of the subplan, with the correlation vars as the >> cache key. > > I find this pretty bogus as it stands, because it assumes without proof > that the subquery will deliver identical results for any two parameter > values that are considered equal by the datatype's default equality > operator. An easy counterexample is a subquery whose result depends on > the text conversion of a float8 parameter: zero and minus zero have > different text forms, but are equal according to float8eq. Good point. > To make this > patch safe, I think you'd need to grovel through the subquery and make > sure that the parameters are only used as inputs to operators that belong > to the type's default btree or hash opfamily. (Many other cases would > work in practice, but we have no semantic knowledge that would let us be > sure of that.) Hmm. First thing that comes to mind is to use the raw bytes as cache key, only treating Datums as equal if their binary representation is identical. > That's doable no doubt, but I wonder whether that leaves you in a place > that's any better than the plan-time-decorrelation approach you proposed > in the earlier thread. I liked that better TBH; this one seems like > a very ad-hoc reinvention of a hash join. I don't especially like the > unpredictable number of executions of the subquery that it results in, > either. I'd certainly prefer the plan-time-decorrelation, too. But I suspect that there's always going to be some cases that cannot be decorrelated, where a cache like this would help. - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 23/05/18 19:25, Tom Lane wrote: >> To make this >> patch safe, I think you'd need to grovel through the subquery and make >> sure that the parameters are only used as inputs to operators that belong >> to the type's default btree or hash opfamily. (Many other cases would >> work in practice, but we have no semantic knowledge that would let us be >> sure of that.) > Hmm. First thing that comes to mind is to use the raw bytes as cache > key, only treating Datums as equal if their binary representation is > identical. Ah. That would work, though it'd make the number of subquery executions even less predictable (since some logically-equal values would compare as physically unequal). regards, tom lane
On 2018-05-23 12:51:38 -0400, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > On 23/05/18 19:25, Tom Lane wrote: > >> To make this > >> patch safe, I think you'd need to grovel through the subquery and make > >> sure that the parameters are only used as inputs to operators that belong > >> to the type's default btree or hash opfamily. (Many other cases would > >> work in practice, but we have no semantic knowledge that would let us be > >> sure of that.) > > > Hmm. First thing that comes to mind is to use the raw bytes as cache > > key, only treating Datums as equal if their binary representation is > > identical. > > Ah. That would work, though it'd make the number of subquery executions > even less predictable (since some logically-equal values would compare > as physically unequal). As long as there's no volatile functions, would anybody care? Greetings, Andres Freund
On 24 May 2018 at 04:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > That's doable no doubt, but I wonder whether that leaves you in a place > that's any better than the plan-time-decorrelation approach you proposed > in the earlier thread. I liked that better TBH; this one seems like > a very ad-hoc reinvention of a hash join. I don't especially like the > unpredictable number of executions of the subquery that it results in, > either. Decorrelation is not always going to be the answer. There's going to be plenty of cases where that makes the plan worse. Consider: SELECT * FROM sometable s WHERE rarelytrue AND y = (SELECT MAX(x) FROM bigtable b WHERE b.z = s.z); If the planner went and re-wrote that to execute as the following would; SELECT * FROM sometable s LEFT JOIN (SELECT z,MAX(x) max FROM bigtable GROUP BY z) b ON b.z = s.z WHERE rarelytrue AND y = b.max; then we've probably gone and built most of the groups for nothing. The planner would have do this based on estimated costs. Having the ability to apply either of these optimisations would be useful, providing the planner applied them correctly. However, I don't think Heikki should be touching the decorrelation as part of this effort. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, May 23, 2018 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ah. That would work, though it'd make the number of subquery executions > even less predictable (since some logically-equal values would compare > as physically unequal). In most cases that seems fine. It might not be fine with the subquery contains volatile functions, though. I think I'd be sad if I wrote a query expecting random() to be executing 26000 times and it got executed 11 times instead. But if the optimizer finds a way to execute int4pl fewer times, that seems like a good thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 23, 2018 at 6:03 AM, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > I think the cache should be limited in size, perhaps by work_mem. > Also, there probably should be a GUC for this, defaulting to "off". Eh, why? Generally we want performance optimizations turned on by default; otherwise only highly-knowledgeable users end up getting any benefit. An exception is when there's some significant downside to the optimization, but I don't think I understand what the downside of this could be. Maybe it's bad if we populate the cache and never use it? But until the patch is written we don't know whether there's really a case that regresses like that, and if there is, it would be better to fix it so it doesn't rather than make the feature off-by-default. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23.05.18 11:31, Heikki Linnakangas wrote: > I've been working on a patch to add a little cache to SubPlans, to speed > up queries with correlated subqueries, where the same subquery is > currently executed multiple times with the same parameters. The idea is > to cache the result of the subplan, with the correlation vars as the > cache key. It looks like this patch might be "returned with feedback" for the moment? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I've been working on a patch to add a little cache to SubPlans, to speed up > queries with correlated subqueries, where the same subquery is currently > executed multiple times with the same parameters. The idea is to cache the > result of the subplan, with the correlation vars as the cache key. Hi, This seems like an interesting area to make improvements, so I've signed up to review the patch. From looking at the code I see that the caching is being done inside nodeSubplan.c. I don't think this is the right approach to the problem. The problem exists for any parameterized path, so I think a more general approach would be much better. We already have Materialize nodes to cache the results of an entire subplan, and this seems to have quite a bit in common with that, only we'd want to cache multiple results with a key to determine which result set should be returned. Due to the similarities with Materialize, I think that the cache should be a node itself and not bury the cache logic in some other node type that's meant for some other purpose. "LazyMaterialize" seems like a good option for a name. It seems better than "LazyHash" since you may not want to restrict it to a hash table based cache in the future. A binary search tree may be a good option for types that cannot be hashed. Materialize nodes are injected above the inner side node of MergeJoins based on cost, so I think this node type could just do the same. Maybe something like estimate_num_groups(<exprs being compared to params>) / path->rows is below some defined constant, perhaps something like 0.5. Although experimentation would be required. It might be good to take into account some other cost factors too. I imagine we'd want to only allow this optimisation for hashjoinable types. This seems pretty natural since your cache implementation is a hash table, so, of course, we're going to need a hash function. Wondering your thoughts on this idea. I'll mark as waiting on author in the meantime. It's great to see someone working on this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jul 9, 2018 at 5:08 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > From looking at the code I see that the caching is being done inside > nodeSubplan.c. I don't think this is the right approach to the > problem. The problem exists for any parameterized path, so I think a > more general approach would be much better. Yeah, perhaps, though sometimes a more specific problem is easier to solve. > "LazyMaterialize" seems like a good option for a name. It seems better > than "LazyHash" since you may not want to restrict it to a hash table > based cache in the future. A binary search tree may be a good option > for types that cannot be hashed. I think that's not too clear, actually. The difference between a Materialize and a LazyMaterialize is not that this is lazy and that's not. It's that this can cache multiple result sets for various parameter values and that can only cache one result set. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jul 09, 2018 at 09:08:07PM +1200, David Rowley wrote: > I'll mark as waiting on author in the meantime. > > It's great to see someone working on this. Not that actively unfortunately. The patch status has remained unchanged since, so I am marking it as returned with feedback. -- Michael
Attachment
On 19 July 2018 at 06:27, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jul 9, 2018 at 5:08 AM, David Rowley >> "LazyMaterialize" seems like a good option for a name. It seems better >> than "LazyHash" since you may not want to restrict it to a hash table >> based cache in the future. A binary search tree may be a good option >> for types that cannot be hashed. > > I think that's not too clear, actually. The difference between a > Materialize and a LazyMaterialize is not that this is lazy and that's > not. It's that this can cache multiple result sets for various > parameter values and that can only cache one result set. Okay. I'm not going to argue with the name of a node type that does not exist yet. I was more suggesting that it should be a modular component that we can plug into whatever plan types suit it. My suggestions for naming was admittedly more of a sales tactic to gain support for the idea, which perhaps failed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Apr 26, 2020 at 2:11 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I've been working on a patch to add a little cache to SubPlans, to speed up
> queries with correlated subqueries, where the same subquery is currently
> executed multiple times with the same parameters. The idea is to cache the
> result of the subplan, with the correlation vars as the cache key.
Hi,
This seems like an interesting area to make improvements, so I've
signed up to review the patch.
From looking at the code I see that the caching is being done inside
nodeSubplan.c. I don't think this is the right approach to the
problem. The problem exists for any parameterized path, so I think a
more general approach would be much better.
I'm +1 on this suggestion. Actually I'm working on the query like this:
select j1o.i, j2_v.sum_5
from j1 j1o
inner join lateral
(select im100, sum(im5) as sum_5
from j2
where j1o.im100 = im100
and j1o.i = 1
group by im100) j2_v
on true
where j1o.i = 1;
from j1 j1o
inner join lateral
(select im100, sum(im5) as sum_5
from j2
where j1o.im100 = im100
and j1o.i = 1
group by im100) j2_v
on true
where j1o.i = 1;
I hope the we have a result cache for j2_v. But this nothing with SubPlan.
If we want to handle this case as well, one of the changes would
be it needs to cache multi records for one input parameter, or return
one row each time but return mutli times for one input parameter,
Tuplestore may be a good option for this case since its full functionalities
like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore,
the next question is how to control the memory usage for this Node.
We can use the dedicated memory context to know how many memory
this node used in total, but we can't stop the tuplestore from using more
memory. Or we can force set both current tuplestore->state to TTS_WRITEFILE
and set the allowedMem to 0 for the following tuplestore, after we find too
memory is used. However this looks a bit of hack.
As a summary of my thinking about this topic before, I'd write another
incomplete options to handle this case. 1) we just use the the Material
Path to cache the result for last parameter only. if the following value is
same as the last one, we use it. Or else, we rescan the Material path.
2). We can consider order the rows by input params key. That will be
much cache friendly.
BTW, I see may node->chgParams was changed without checking if the
datum changed from the previous one, this may lost some optimization
opportunities (for example during the agg node rescan case, if the params
is not changed, the we use the hash table we build before). So if we we
add new parameter to node->chgParams only if the datum really changed,
will it still be semantic correctly.
It's great to see someone working on this.
I'd like to have a try.
Best Regards
Andy Fan
On Sun, 26 Apr 2020 at 19:08, Andy Fan <zhihui.fan1213@gmail.com> wrote: > If we want to handle this case as well, one of the changes would > be it needs to cache multi records for one input parameter, or return > one row each time but return mutli times for one input parameter, > Tuplestore may be a good option for this case since its full functionalities > like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore, > the next question is how to control the memory usage for this Node. > We can use the dedicated memory context to know how many memory > this node used in total, but we can't stop the tuplestore from using more > memory. Or we can force set both current tuplestore->state to TTS_WRITEFILE > and set the allowedMem to 0 for the following tuplestore, after we find too > memory is used. However this looks a bit of hack. I didn't imagine a tuplestore would be that useful for this. A node like this will do its best work when the ratio of n_values / distinct_values of the parameters is high. The planner can often not be that great at knowing the number of distinct values, especially so when there is more than one expression to estimate the number of distinct values for. (we added extended statistics to try to help with that). I think this node will do its best when the time spent for a cache miss it bearly any more expensive than scanning the subnode to get the results. If we can do that then we'll see fewer regressions for when we inject one of these nodes where it'll do no good, e.g when we'll never get a repeated value. If we start spilling these tuples out to disk then it adds overhead which might never pay off. I'd suggest a hash table to act as an MRU cache. We'd just evict old values when we run out of space, i.e consume all of work_mem. I've got a bunch of code locally which is still a work in progress to do this. I'll finish it off and post it here. David
On Sun, Apr 26, 2020 at 5:49 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, 26 Apr 2020 at 19:08, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> If we want to handle this case as well, one of the changes would
> be it needs to cache multi records for one input parameter, or return
> one row each time but return mutli times for one input parameter,
> Tuplestore may be a good option for this case since its full functionalities
> like tuple_puttuple, tuple_gettuple. But if we implement it with tuplestore,
> the next question is how to control the memory usage for this Node.
> We can use the dedicated memory context to know how many memory
> this node used in total, but we can't stop the tuplestore from using more
> memory. Or we can force set both current tuplestore->state to TTS_WRITEFILE
> and set the allowedMem to 0 for the following tuplestore, after we find too
> memory is used. However this looks a bit of hack.
I didn't imagine a tuplestore would be that useful for this. A node
like this will do its best work when the ratio of n_values /
distinct_values of the parameters is high. The planner can often not
be that great at knowing the number of distinct values, especially so
when there is more than one expression to estimate the number of
distinct values for. (we added extended statistics to try to help with
that). I think this node will do its best when the time spent for a
cache miss it bearly any more expensive than scanning the subnode to
get the results. If we can do that then we'll see fewer regressions
for when we inject one of these nodes where it'll do no good, e.g when
we'll never get a repeated value. If we start spilling these tuples
out to disk then it adds overhead which might never pay off.
I'd suggest a hash table to act as an MRU cache. We'd just evict old
values when we run out of space, i.e consume all of work_mem.
I've got a bunch of code locally which is still a work in progress to
do this. I'll finish it off and post it here.
I was feeling that we may have to maintain some extra status if we use hash
table rather than tuple store, but that might be not a major concern. I can
wait and see your patch.
Best Regards
Andy Fan
On Mon, 27 Apr 2020 at 00:37, Andy Fan <zhihui.fan1213@gmail.com> wrote: > I was feeling that we may have to maintain some extra status if we use hash > table rather than tuple store, but that might be not a major concern. I can > wait and see your patch. I've posted the patch and lots of details about it in https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com David
On Wed, May 20, 2020 at 7:47 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 27 Apr 2020 at 00:37, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> I was feeling that we may have to maintain some extra status if we use hash
> table rather than tuple store, but that might be not a major concern. I can
> wait and see your patch.
I've posted the patch and lots of details about it in
https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com
Amazing! I will start to study it today.
Best Regards
Andy Fan