Thread: Check lateral references within PHVs for memoize cache keys
If we intend to generate a memoize node atop a path, we need some kind
of cache key. Currently we search the path's parameterized clauses and
its parent's lateral_vars for that. ISTM this is not sufficient because
their might be lateral references derived from PlaceHolderVars, which
can also act as cache key but we neglect to take into consideration. As
an example, consider
create table t(a int);
insert into t values (1), (1), (1), (1);
analyze t;
explain (costs off) select * from t t1 left join lateral (select t1.a as t1a, t2.a as t2a from t t2) s on true where s.t1a = s.t2a;
QUERY PLAN
----------------------------
Nested Loop
-> Seq Scan on t t1
-> Seq Scan on t t2
Filter: (t1.a = a)
(4 rows)
We cannot find available cache keys for memoize node because the inner
side has neither parameterized path clauses nor lateral_vars. However
if we are able to look in the PHV for lateral references, we will find
the cache key 't1.a'.
Actually we do have checked PHVs for lateral references, earlier in
create_lateral_join_info. But that time we only marked lateral_relids
and direct_lateral_relids, without remembering the lateral expressions.
So I'm wondering whether we can fix that by fetching Vars (or PHVs) of
lateral references within PlaceHolderVars and remembering them in the
baserel's lateral_vars.
Attach a draft patch to show my thoughts.
Thanks
Richard
of cache key. Currently we search the path's parameterized clauses and
its parent's lateral_vars for that. ISTM this is not sufficient because
their might be lateral references derived from PlaceHolderVars, which
can also act as cache key but we neglect to take into consideration. As
an example, consider
create table t(a int);
insert into t values (1), (1), (1), (1);
analyze t;
explain (costs off) select * from t t1 left join lateral (select t1.a as t1a, t2.a as t2a from t t2) s on true where s.t1a = s.t2a;
QUERY PLAN
----------------------------
Nested Loop
-> Seq Scan on t t1
-> Seq Scan on t t2
Filter: (t1.a = a)
(4 rows)
We cannot find available cache keys for memoize node because the inner
side has neither parameterized path clauses nor lateral_vars. However
if we are able to look in the PHV for lateral references, we will find
the cache key 't1.a'.
Actually we do have checked PHVs for lateral references, earlier in
create_lateral_join_info. But that time we only marked lateral_relids
and direct_lateral_relids, without remembering the lateral expressions.
So I'm wondering whether we can fix that by fetching Vars (or PHVs) of
lateral references within PlaceHolderVars and remembering them in the
baserel's lateral_vars.
Attach a draft patch to show my thoughts.
Thanks
Richard
Attachment
On Fri, Dec 9, 2022 at 5:16 PM Richard Guo <guofenglinux@gmail.com> wrote:
Actually we do have checked PHVs for lateral references, earlier in
create_lateral_join_info. But that time we only marked lateral_relids
and direct_lateral_relids, without remembering the lateral expressions.
So I'm wondering whether we can fix that by fetching Vars (or PHVs) of
lateral references within PlaceHolderVars and remembering them in the
baserel's lateral_vars.
Attach a draft patch to show my thoughts.
Update the patch to fix test failures.
Thanks
Richard
Thanks
Richard
Attachment
On Fri, 30 Dec 2022 at 16:00, Richard Guo <guofenglinux@gmail.com> wrote: > > On Fri, Dec 9, 2022 at 5:16 PM Richard Guo <guofenglinux@gmail.com> wrote: >> >> Actually we do have checked PHVs for lateral references, earlier in >> create_lateral_join_info. But that time we only marked lateral_relids >> and direct_lateral_relids, without remembering the lateral expressions. >> So I'm wondering whether we can fix that by fetching Vars (or PHVs) of >> lateral references within PlaceHolderVars and remembering them in the >> baserel's lateral_vars. >> >> Attach a draft patch to show my thoughts. I'm surprised to see that it's only Memoize that ever makes use of lateral_vars. I'd need a bit more time to process your patch, but one additional thought I had was that I wonder if the following code is still needed in nodeMemoize.c if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids)) cache_purge_all(node); Ideally, that would be an Assert failure, but possibly we should probably still call cache_purge_all(node) after Assert(false) so that at least we'd not start returning wrong results if we've happened to miss other cache keys. I thought maybe something like: if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids)) { /* * Really the planner should have added all the possible parameters to * the cache keys, so let's Assert fail here so we get the memo to fix * that can fix that. On production builds, we'd better purge the * cache to account for the changed parameter value. */ Assert(false); cache_purge_all(node); } I've not run the tests to ensure we don't get an Assert failure with that, however. All that cache_purge_all code added in 411137a42 likely was an incorrect fix for what you've raised here, but it's maybe a good failsafe to keep around even if we think we've now found all possible parameters that can invalidate the memorized results. David
On Tue, Jan 24, 2023 at 10:07 AM David Rowley <dgrowleyml@gmail.com> wrote:
I'm surprised to see that it's only Memoize that ever makes use of
lateral_vars. I'd need a bit more time to process your patch, but one
additional thought I had was that I wonder if the following code is
still needed in nodeMemoize.c
if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids))
cache_purge_all(node);
Ideally, that would be an Assert failure, but possibly we should
probably still call cache_purge_all(node) after Assert(false) so that
at least we'd not start returning wrong results if we've happened to
miss other cache keys. I thought maybe something like:
Hmm, I think this code is still needed because the parameter contained
in the subplan below a Memoize node may come from parent plan, as in the
test query added in 411137a42.
EXPLAIN (COSTS OFF)
SELECT unique1 FROM tenk1 t0
WHERE unique1 < 3
AND EXISTS (
SELECT 1 FROM tenk1 t1
INNER JOIN tenk1 t2 ON t1.unique1 = t2.hundred
WHERE t0.ten = t1.twenty AND t0.two <> t2.four OFFSET 0);
QUERY PLAN
----------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 t0
Index Cond: (unique1 < 3)
Filter: (SubPlan 1)
SubPlan 1
-> Nested Loop
-> Index Scan using tenk1_hundred on tenk1 t2
Filter: (t0.two <> four)
-> Memoize
Cache Key: t2.hundred
Cache Mode: logical
-> Index Scan using tenk1_unique1 on tenk1 t1
Index Cond: (unique1 = t2.hundred)
Filter: (t0.ten = twenty)
(13 rows)
Currently we don't have a way to add Params of uplevel vars to Memoize
cache keys. So I think we still need to call cache_purge_all() each
time uplevel Params change.
Thanks
Richard
in the subplan below a Memoize node may come from parent plan, as in the
test query added in 411137a42.
EXPLAIN (COSTS OFF)
SELECT unique1 FROM tenk1 t0
WHERE unique1 < 3
AND EXISTS (
SELECT 1 FROM tenk1 t1
INNER JOIN tenk1 t2 ON t1.unique1 = t2.hundred
WHERE t0.ten = t1.twenty AND t0.two <> t2.four OFFSET 0);
QUERY PLAN
----------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 t0
Index Cond: (unique1 < 3)
Filter: (SubPlan 1)
SubPlan 1
-> Nested Loop
-> Index Scan using tenk1_hundred on tenk1 t2
Filter: (t0.two <> four)
-> Memoize
Cache Key: t2.hundred
Cache Mode: logical
-> Index Scan using tenk1_unique1 on tenk1 t1
Index Cond: (unique1 = t2.hundred)
Filter: (t0.ten = twenty)
(13 rows)
Currently we don't have a way to add Params of uplevel vars to Memoize
cache keys. So I think we still need to call cache_purge_all() each
time uplevel Params change.
Thanks
Richard
On Fri, Dec 30, 2022 at 11:00 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Dec 9, 2022 at 5:16 PM Richard Guo <guofenglinux@gmail.com> wrote:Actually we do have checked PHVs for lateral references, earlier in
create_lateral_join_info. But that time we only marked lateral_relids
and direct_lateral_relids, without remembering the lateral expressions.
So I'm wondering whether we can fix that by fetching Vars (or PHVs) of
lateral references within PlaceHolderVars and remembering them in the
baserel's lateral_vars.
Attach a draft patch to show my thoughts.Update the patch to fix test failures.
Rebase the patch on HEAD as cfbot reminds.
Thanks
Richard
Thanks
Richard
Attachment
On Tue, Jul 4, 2023 at 12:33 AM Richard Guo <guofenglinux@gmail.com> wrote: > > Rebase the patch on HEAD as cfbot reminds. All of this seems good to me. I can reproduce the problem, tests pass, and the change is sensible as far as I can tell. One adjacent thing I noticed is that when we renamed "Result Cache" to "Memoize" this bit of the docs in config.sgml got skipped (probably because of the line break): Hash tables are used in hash joins, hash-based aggregation, result cache nodes and hash-based processing of <literal>IN</literal> subqueries. I believe that should say "memoize nodes" instead. Is it worth correcting that as part of this patch? Or perhaps another one? Regards, -- Paul ~{:-) pj@illuminatedcomputing.com
Richard Guo <guofenglinux@gmail.com> writes: > Rebase the patch on HEAD as cfbot reminds. This fix seems like a mess. The function that is in charge of filling RelOptInfo.lateral_vars is extract_lateral_references; or at least that was how it was done up to now. Can't we compute these additional references there? If not, maybe we ought to just merge extract_lateral_references into create_lateral_join_info, rather than having the responsibility split. I also wonder whether this change isn't creating hidden dependencies on RTE order (which would likely be bugs), since create_lateral_join_info itself examines the lateral_vars lists as it walks the rtable. More generally, it's not clear to me why we should need to look inside lateral PHVs in the first place. Wouldn't the lateral PHV itself serve fine as a cache key? regards, tom lane
On Sun, 9 Jul 2023 at 05:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > More generally, it's not clear to me why we should need to look inside > lateral PHVs in the first place. Wouldn't the lateral PHV itself > serve fine as a cache key? For Memoize specifically, I purposefully made it so the expression was used as a cache key rather than extracting the Vars from it and using those. The reason for that was that the expression may result in fewer distinct values to cache tuples for. For example: create table t1 (a int primary key); create table t2 (a int primary key); create statistics on (a % 10) from t2; insert into t2 select x from generate_Series(1,1000000)x; insert into t1 select x from generate_Series(1,1000000)x; analyze t1,t2; explain (analyze, costs off) select * from t1 inner join t2 on t1.a=t2.a%10; QUERY PLAN -------------------------------------------------------------------------------------------- Nested Loop (actual time=0.015..212.798 rows=900000 loops=1) -> Seq Scan on t2 (actual time=0.006..33.479 rows=1000000 loops=1) -> Memoize (actual time=0.000..0.000 rows=1 loops=1000000) Cache Key: (t2.a % 10) Cache Mode: logical Hits: 999990 Misses: 10 Evictions: 0 Overflows: 0 Memory Usage: 1kB -> Index Only Scan using t1_pkey on t1 (actual time=0.001..0.001 rows=1 loops=10) Index Cond: (a = (t2.a % 10)) Heap Fetches: 0 Planning Time: 0.928 ms Execution Time: 229.621 ms (11 rows)
On Sat, 8 Jul 2023 at 17:24, Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > One adjacent thing I noticed is that when we renamed "Result Cache" to > "Memoize" this bit of the docs in config.sgml got skipped (probably > because of the line break): > > Hash tables are used in hash joins, hash-based aggregation, result > cache nodes and hash-based processing of <literal>IN</literal> > subqueries. > > I believe that should say "memoize nodes" instead. Is it worth > correcting that as part of this patch? Or perhaps another one? I just pushed a fix for this. Thanks for reporting it. David
On Sun, Jul 9, 2023 at 1:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> Rebase the patch on HEAD as cfbot reminds.
This fix seems like a mess. The function that is in charge of filling
RelOptInfo.lateral_vars is extract_lateral_references; or at least
that was how it was done up to now. Can't we compute these additional
references there? If not, maybe we ought to just merge
extract_lateral_references into create_lateral_join_info, rather than
having the responsibility split. I also wonder whether this change
isn't creating hidden dependencies on RTE order (which would likely be
bugs), since create_lateral_join_info itself examines the lateral_vars
lists as it walks the rtable.
Yeah, you're right. Currently all RelOptInfo.lateral_vars are filled in
extract_lateral_references. And then in create_lateral_join_info these
lateral_vars, together with all PlaceHolderInfos, are examined to
compute the per-base-relation lateral refs relids. However, in the
process of extract_lateral_references, it's possible that we create new
PlaceHolderInfos. So I think it may not be a good idea to extract the
lateral references within PHVs there. But I agree with you that it's
also not a good idea to compute these additional lateral Vars within
PHVs in create_lateral_join_info as the patch does. Actually with the
patch I find that with PHVs that are due to be evaluated at a join we
may get a problematic plan. For instance
explain (costs off)
select * from t t1 left join lateral
(select t1.a as t1a, t2.a as t2a from t t2 join t t3 on true) s on true
where s.t1a = s.t2a;
QUERY PLAN
------------------------------------
Nested Loop
-> Seq Scan on t t1
-> Nested Loop
Join Filter: (t1.a = t2.a)
-> Seq Scan on t t2
-> Memoize
Cache Key: t1.a
Cache Mode: binary
-> Seq Scan on t t3
(9 rows)
There are no lateral refs in the subtree of the Memoize node, so it
should be a Materialize node rather than a Memoize node. This is caused
by that for a PHV that is due to be evaluated at a join, we fill its
lateral refs in each baserel in the join, which is wrong.
So I'm wondering if it'd be better that we move all this logic of
computing additional lateral references within PHVs to get_memoize_path,
where we can examine only PHVs that are evaluated at innerrel. And
considering that these lateral refs are only used by Memoize, it seems
more sensible to compute them there. But I'm a little worried that
doing this would make get_memoize_path too expensive.
Please see v4 patch for this change.
Thanks
Richard
extract_lateral_references. And then in create_lateral_join_info these
lateral_vars, together with all PlaceHolderInfos, are examined to
compute the per-base-relation lateral refs relids. However, in the
process of extract_lateral_references, it's possible that we create new
PlaceHolderInfos. So I think it may not be a good idea to extract the
lateral references within PHVs there. But I agree with you that it's
also not a good idea to compute these additional lateral Vars within
PHVs in create_lateral_join_info as the patch does. Actually with the
patch I find that with PHVs that are due to be evaluated at a join we
may get a problematic plan. For instance
explain (costs off)
select * from t t1 left join lateral
(select t1.a as t1a, t2.a as t2a from t t2 join t t3 on true) s on true
where s.t1a = s.t2a;
QUERY PLAN
------------------------------------
Nested Loop
-> Seq Scan on t t1
-> Nested Loop
Join Filter: (t1.a = t2.a)
-> Seq Scan on t t2
-> Memoize
Cache Key: t1.a
Cache Mode: binary
-> Seq Scan on t t3
(9 rows)
There are no lateral refs in the subtree of the Memoize node, so it
should be a Materialize node rather than a Memoize node. This is caused
by that for a PHV that is due to be evaluated at a join, we fill its
lateral refs in each baserel in the join, which is wrong.
So I'm wondering if it'd be better that we move all this logic of
computing additional lateral references within PHVs to get_memoize_path,
where we can examine only PHVs that are evaluated at innerrel. And
considering that these lateral refs are only used by Memoize, it seems
more sensible to compute them there. But I'm a little worried that
doing this would make get_memoize_path too expensive.
Please see v4 patch for this change.
Thanks
Richard
Attachment
On Sun, Jul 9, 2023 at 1:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
More generally, it's not clear to me why we should need to look inside
lateral PHVs in the first place. Wouldn't the lateral PHV itself
serve fine as a cache key?
Do you mean we use the lateral PHV directly as a cache key? Hmm, it
seems to me that we'd have problem if the PHV references rels that are
inside the PHV's syntactic scope. For instance
select * from t t1 left join
lateral (select t1.a+t2.a as t1a, t2.a as t2a from t t2) s on true
where s.t1a = s.t2a;
The PHV references t1.a so it's lateral. But it also references t2.a,
so if we use the PHV itself as cache key, the plan would look like
QUERY PLAN
----------------------------------------
Nested Loop
-> Seq Scan on t t1
-> Memoize
Cache Key: (t1.a + t2.a)
Cache Mode: binary
-> Seq Scan on t t2
Filter: ((t1.a + a) = a)
(7 rows)
which is an invalid plan as the cache key contains t2.a.
Thanks
Richard
seems to me that we'd have problem if the PHV references rels that are
inside the PHV's syntactic scope. For instance
select * from t t1 left join
lateral (select t1.a+t2.a as t1a, t2.a as t2a from t t2) s on true
where s.t1a = s.t2a;
The PHV references t1.a so it's lateral. But it also references t2.a,
so if we use the PHV itself as cache key, the plan would look like
QUERY PLAN
----------------------------------------
Nested Loop
-> Seq Scan on t t1
-> Memoize
Cache Key: (t1.a + t2.a)
Cache Mode: binary
-> Seq Scan on t t2
Filter: ((t1.a + a) = a)
(7 rows)
which is an invalid plan as the cache key contains t2.a.
Thanks
Richard
On Sun, Jul 9, 2023 at 12:17 PM David Rowley <dgrowleyml@gmail.com> wrote:
I just pushed a fix for this. Thanks for reporting it.
BTW, I noticed a typo in the comment inside paraminfo_get_equal_hashops.
foreach(lc, innerrel->lateral_vars)
{
Node *expr = (Node *) lfirst(lc);
TypeCacheEntry *typentry;
/* Reject if there are any volatile functions in PHVs */
if (contain_volatile_functions(expr))
{
list_free(*operators);
list_free(*param_exprs);
return false;
}
The expressions in RelOptInfo.lateral_vars are not necessarily from
PHVs. For instance
explain (costs off)
select * from t t1 join
lateral (select * from t t2 where t1.a = t2.a offset 0) on true;
QUERY PLAN
----------------------------------
Nested Loop
-> Seq Scan on t t1
-> Memoize
Cache Key: t1.a
Cache Mode: binary
-> Seq Scan on t t2
Filter: (t1.a = a)
(7 rows)
The lateral Var 't1.a' comes from the lateral subquery, not PHV.
This seems a typo from 63e4f13d. How about we change it to the below?
- /* Reject if there are any volatile functions in PHVs */
+ /* Reject if there are any volatile functions in lateral vars */
Thanks
Richard
foreach(lc, innerrel->lateral_vars)
{
Node *expr = (Node *) lfirst(lc);
TypeCacheEntry *typentry;
/* Reject if there are any volatile functions in PHVs */
if (contain_volatile_functions(expr))
{
list_free(*operators);
list_free(*param_exprs);
return false;
}
The expressions in RelOptInfo.lateral_vars are not necessarily from
PHVs. For instance
explain (costs off)
select * from t t1 join
lateral (select * from t t2 where t1.a = t2.a offset 0) on true;
QUERY PLAN
----------------------------------
Nested Loop
-> Seq Scan on t t1
-> Memoize
Cache Key: t1.a
Cache Mode: binary
-> Seq Scan on t t2
Filter: (t1.a = a)
(7 rows)
The lateral Var 't1.a' comes from the lateral subquery, not PHV.
This seems a typo from 63e4f13d. How about we change it to the below?
- /* Reject if there are any volatile functions in PHVs */
+ /* Reject if there are any volatile functions in lateral vars */
Thanks
Richard
On Thu, Jul 13, 2023 at 3:12 PM Richard Guo <guofenglinux@gmail.com> wrote:
So I'm wondering if it'd be better that we move all this logic of
computing additional lateral references within PHVs to get_memoize_path,
where we can examine only PHVs that are evaluated at innerrel. And
considering that these lateral refs are only used by Memoize, it seems
more sensible to compute them there. But I'm a little worried that
doing this would make get_memoize_path too expensive.
Please see v4 patch for this change.
I'd like to add that not checking PHVs for lateral references can lead
to performance regressions with Memoize node. For instance,
-- by default, enable_memoize is on
regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
-------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.028..105245.547 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.011..3.760 rows=10000 loops=1)
-> Memoize (actual time=0.010..8.051 rows=5000 loops=10000)
Cache Key: t1.two
Cache Mode: logical
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory Usage: 1368kB
-> Seq Scan on tenk1 t2 (actual time=0.004..3.594 rows=5000 loops=10000)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 1.943 ms
Execution Time: 106806.043 ms
(11 rows)
-- turn enable_memoize off
regression=# set enable_memoize to off;
SET
regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.048..44831.707 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..2.340 rows=10000 loops=1)
-> Seq Scan on tenk1 t2 (actual time=0.002..3.282 rows=5000 loops=10000)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 0.641 ms
Execution Time: 46472.609 ms
(7 rows)
As we can see, when Memoize enabled (which is the default setting), the
execution time increases by around 129.83%, indicating a significant
performance regression.
This is caused by that we fail to realize that 't1.four', which is from
the PHV, should be included in the cache keys. And that makes us have
to purge the entire cache every time we get a new outer tuple. This is
also implied by the abnormal Memoize runtime stats:
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0
This regression can be fixed by the patch here. After applying the v4
patch, 't1.four' is added into the cache keys, and the same query runs
much faster.
regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
---------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.060..20446.004 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..5.845 rows=10000 loops=1)
-> Memoize (actual time=0.001..0.209 rows=5000 loops=10000)
Cache Key: t1.two, t1.four
Cache Mode: binary
Hits: 9996 Misses: 4 Evictions: 0 Overflows: 0 Memory Usage: 5470kB
-> Seq Scan on tenk1 t2 (actual time=0.005..3.659 rows=5000 loops=4)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 0.579 ms
Execution Time: 21756.598 ms
(11 rows)
Comparing the first plan and the third plan, this query runs ~5 times
faster.
Thanks
Richard
to performance regressions with Memoize node. For instance,
-- by default, enable_memoize is on
regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
-------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.028..105245.547 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.011..3.760 rows=10000 loops=1)
-> Memoize (actual time=0.010..8.051 rows=5000 loops=10000)
Cache Key: t1.two
Cache Mode: logical
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory Usage: 1368kB
-> Seq Scan on tenk1 t2 (actual time=0.004..3.594 rows=5000 loops=10000)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 1.943 ms
Execution Time: 106806.043 ms
(11 rows)
-- turn enable_memoize off
regression=# set enable_memoize to off;
SET
regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.048..44831.707 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..2.340 rows=10000 loops=1)
-> Seq Scan on tenk1 t2 (actual time=0.002..3.282 rows=5000 loops=10000)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 0.641 ms
Execution Time: 46472.609 ms
(7 rows)
As we can see, when Memoize enabled (which is the default setting), the
execution time increases by around 129.83%, indicating a significant
performance regression.
This is caused by that we fail to realize that 't1.four', which is from
the PHV, should be included in the cache keys. And that makes us have
to purge the entire cache every time we get a new outer tuple. This is
also implied by the abnormal Memoize runtime stats:
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0
This regression can be fixed by the patch here. After applying the v4
patch, 't1.four' is added into the cache keys, and the same query runs
much faster.
regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
---------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.060..20446.004 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..5.845 rows=10000 loops=1)
-> Memoize (actual time=0.001..0.209 rows=5000 loops=10000)
Cache Key: t1.two, t1.four
Cache Mode: binary
Hits: 9996 Misses: 4 Evictions: 0 Overflows: 0 Memory Usage: 5470kB
-> Seq Scan on tenk1 t2 (actual time=0.005..3.659 rows=5000 loops=4)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 0.579 ms
Execution Time: 21756.598 ms
(11 rows)
Comparing the first plan and the third plan, this query runs ~5 times
faster.
Thanks
Richard
On Mon, Dec 25, 2023 at 3:01 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Jul 13, 2023 at 3:12 PM Richard Guo <guofenglinux@gmail.com> wrote:So I'm wondering if it'd be better that we move all this logic of
computing additional lateral references within PHVs to get_memoize_path,
where we can examine only PHVs that are evaluated at innerrel. And
considering that these lateral refs are only used by Memoize, it seems
more sensible to compute them there. But I'm a little worried that
doing this would make get_memoize_path too expensive.
Please see v4 patch for this change.I'd like to add that not checking PHVs for lateral references can lead
to performance regressions with Memoize node.
The v4 patch does not apply any more. I've rebased it on master.
Nothing else has changed.
Thanks
Richard
Nothing else has changed.
Thanks
Richard
Attachment
On Fri, Feb 2, 2024 at 5:18 PM Richard Guo <guofenglinux@gmail.com> wrote:
The v4 patch does not apply any more. I've rebased it on master.
Nothing else has changed.
Here is another rebase over master so it applies again. I also added a
commit message to help review. Nothing else has changed.
Thanks
Richard
commit message to help review. Nothing else has changed.
Thanks
Richard
Attachment
On Mon, Mar 18, 2024 at 4:36 PM Richard Guo <guofenglinux@gmail.com> wrote: > Here is another rebase over master so it applies again. I also added a > commit message to help review. Nothing else has changed. AFAIU currently we do not add Memoize nodes on top of join relation paths. This is because the ParamPathInfos for join relation paths do not maintain ppi_clauses, as the set of relevant clauses varies depending on how the join is formed. In addition, joinrels do not maintain lateral_vars. So we do not have a way to extract cache keys from joinrels. (Besides, there are places where the code doesn't cope with Memoize path on top of a joinrel path, such as in get_param_path_clause_serials.) Therefore, when extracting lateral references within PlaceHolderVars, there is no need to consider those that are due to be evaluated at joinrels. Hence, here is v7 patch for that. In passing, this patch also includes a comment explaining that Memoize nodes are currently not added on top of join relation paths (maybe we should have a separate patch for this?). Thanks Richard
Attachment
On 6/18/24 08:47, Richard Guo wrote: > On Mon, Mar 18, 2024 at 4:36 PM Richard Guo <guofenglinux@gmail.com> wrote: >> Here is another rebase over master so it applies again. I also added a >> commit message to help review. Nothing else has changed. > > AFAIU currently we do not add Memoize nodes on top of join relation > paths. This is because the ParamPathInfos for join relation paths do > not maintain ppi_clauses, as the set of relevant clauses varies > depending on how the join is formed. In addition, joinrels do not > maintain lateral_vars. So we do not have a way to extract cache keys > from joinrels. > > (Besides, there are places where the code doesn't cope with Memoize path > on top of a joinrel path, such as in get_param_path_clause_serials.) > > Therefore, when extracting lateral references within PlaceHolderVars, > there is no need to consider those that are due to be evaluated at > joinrels. > > Hence, here is v7 patch for that. In passing, this patch also includes > a comment explaining that Memoize nodes are currently not added on top > of join relation paths (maybe we should have a separate patch for this?). Hi, I have reviewed v7 of the patch. This improvement is good enough to be applied, thought. Here is some notes: Comment may be rewritten for clarity: "Determine if the clauses in param_info and innerrel's lateral_vars" - I'd replace lateral_vars with 'lateral references' to combine in one phrase PHV from rel and root->placeholder_list sources. I wonder if we can add whole PHV expression instead of the Var (as discussed above) just under some condition: if (!bms_intersect(pull_varnos(root, (Node *) phinfo->ph_var->phexpr), innerrelids)) { // Add whole PHV } else { // Add only pulled vars } I got the point about Memoize over join, but as a join still calls replace_nestloop_params to replace parameters in its clauses, why not to invent something similar to find Memoize keys inside specific JoinPath node? It is not the issue of this patch, though - but is it doable? IMO, the code: if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids)) cache_purge_all(node); is a good place to check an assertion: is it really the parent query parameters that make a difference between memoize keys and node list of parameters? Generally, this patch looks good for me to be committed. -- regards, Andrei Lepikhov
On Fri, Jun 28, 2024 at 10:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > I have reviewed v7 of the patch. This improvement is good enough to be > applied, thought. Thank you for reviewing this patch! > Comment may be rewritten for clarity: > "Determine if the clauses in param_info and innerrel's lateral_vars" - > I'd replace lateral_vars with 'lateral references' to combine in one > phrase PHV from rel and root->placeholder_list sources. Makes sense. I ended up using 'innerrel's lateral vars' to include both the lateral Vars/PHVs found in innerrel->lateral_vars and those extracted from within PlaceHolderVars that are due to be evaluated at innerrel. > I wonder if we can add whole PHV expression instead of the Var (as > discussed above) just under some condition: > if (!bms_intersect(pull_varnos(root, (Node *) phinfo->ph_var->phexpr), > innerrelids)) > { > // Add whole PHV > } > else > { > // Add only pulled vars > } Good point. After considering it further, I think we should do this. As David explained, this can be beneficial in cases where the whole expression results in fewer distinct values to cache tuples for. > I got the point about Memoize over join, but as a join still calls > replace_nestloop_params to replace parameters in its clauses, why not to > invent something similar to find Memoize keys inside specific JoinPath > node? It is not the issue of this patch, though - but is it doable? I don't think it's impossible to do, but I'm skeptical that there's an easy way to identify all the cache keys for joinrels, without having available ppi_clauses and lateral_vars. > IMO, the code: > if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids)) > cache_purge_all(node); > > is a good place to check an assertion: is it really the parent query > parameters that make a difference between memoize keys and node list of > parameters? I don't think we have enough info available here to identify which params within outerPlan->chgParam are from outer levels. Maybe we can store root->outer_params in the MemoizeState node to help with this assertion, but I'm not sure if it's worth the trouble. Attached is an updated version of this patch. Thanks Richard
Attachment
On 7/11/24 16:18, Richard Guo wrote: > On Fri, Jun 28, 2024 at 10:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote: >> I got the point about Memoize over join, but as a join still calls >> replace_nestloop_params to replace parameters in its clauses, why not to >> invent something similar to find Memoize keys inside specific JoinPath >> node? It is not the issue of this patch, though - but is it doable? > > I don't think it's impossible to do, but I'm skeptical that there's an > easy way to identify all the cache keys for joinrels, without having > available ppi_clauses and lateral_vars. Ok > >> IMO, the code: >> if (bms_nonempty_difference(outerPlan->chgParam, node->keyparamids)) >> cache_purge_all(node); >> >> is a good place to check an assertion: is it really the parent query >> parameters that make a difference between memoize keys and node list of >> parameters? > > I don't think we have enough info available here to identify which > params within outerPlan->chgParam are from outer levels. Maybe we can > store root->outer_params in the MemoizeState node to help with this > assertion, but I'm not sure if it's worth the trouble. Got it > > Attached is an updated version of this patch. I'm not sure about stability of output format of AVG aggregate across different platforms. Maybe better to return the result of comparison between the AVG() and expected value? -- regards, Andrei Lepikhov
On Fri, Jul 12, 2024 at 11:18 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > I'm not sure about stability of output format of AVG aggregate across > different platforms. Maybe better to return the result of comparison > between the AVG() and expected value? I don't think this is a problem. AFAIK we use AVG() a lot in the existing test cases. Thanks Richard
On Thu, Jul 11, 2024 at 5:18 PM Richard Guo <guofenglinux@gmail.com> wrote: > Attached is an updated version of this patch. I've pushed this patch. Thanks for all the reviews. Thanks Richard