Thread: Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Tom Lane
Date:
[ switching to -hackers list ] David Rowley <dgrowleyml@gmail.com> writes: > In case it saves you a bit of time, I stripped as much of the > unrelated stuff out as I could and got: > create table t (a name, b int); > explain select * from (select a::varchar,b from (select distinct a,b > from t) st) t right join t t2 on t.b=t2.b where t.a='test'; > getting rid of the cast or swapping to INNER JOIN rather than RIGHT > JOIN means that qual_is_pushdown_safe() gets a Var rather than a > PlaceHolderVar. Thanks. So it seems that what's happening is that we stick a PlaceHolderVar on the intermediate subquery's output ("a::varchar"), and then later when we realize that the RIGHT JOIN can be reduced to an inner join we run around and remove the right join from the PlaceHolderVar's nullingrels, leaving a useless PHV with no nullingrels. remove_nulling_relids explains * Note: it might seem desirable to remove the PHV altogether if * phnullingrels goes to empty. Currently we dare not do that * because we use PHVs in some cases to enforce separate identity * of subexpressions; see wrap_non_vars usages in prepjointree.c. However, then when we consider whether the upper WHERE condition can be pushed down into the unflattened lower subquery, qual_is_pushdown_safe punts: * XXX Punt if we find any PlaceHolderVars in the restriction clause. * It's not clear whether a PHV could safely be pushed down, and even * less clear whether such a situation could arise in any cases of * practical interest anyway. So for the moment, just refuse to push * down. We didn't see this particular behavior before 2489d76c49 because pullup_replace_vars avoided inserting a PHV: * If it contains a Var of the subquery being pulled up, and * does not contain any non-strict constructs, then it's * certainly nullable so we don't need to insert a * PlaceHolderVar. I dropped that case in 2489d76c49 because now we need to attach nullingrels to the expression. You could imagine attaching the nullingrels to the contained Var(s) instead of putting a PHV on top, but that seems like a mess and I'm not quite sure it's semantically the same. In any case it wouldn't fix adjacent cases where there is a non-strict construct in the subquery output expression. So it seems like we need to fix one or the other of these implementation shortcuts to restore the previous behavior. I'm wondering if it'd be okay for qual_is_pushdown_safe to accept PHVs that have no nullingrels. I'm not really thrilled about trying to back-patch any such fix though --- the odds of introducing new bugs seem nontrivial, and the problem case seems rather narrow. If we are willing to accept a HEAD-only fix, it'd likely be better to attack the other end and make it possible to remove no-op PHVs. I think that'd require marking PHVs that need to be kept because they are serving to isolate subexpressions. regards, tom lane
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Tom Lane
Date:
I wrote: > We didn't see this particular behavior before 2489d76c49 because > pullup_replace_vars avoided inserting a PHV: > * If it contains a Var of the subquery being pulled up, and > * does not contain any non-strict constructs, then it's > * certainly nullable so we don't need to insert a > * PlaceHolderVar. > I dropped that case in 2489d76c49 because now we need to attach > nullingrels to the expression. You could imagine attaching the > nullingrels to the contained Var(s) instead of putting a PHV on top, > but that seems like a mess and I'm not quite sure it's semantically > the same. In any case it wouldn't fix adjacent cases where there is > a non-strict construct in the subquery output expression. I realized that actually we do have the mechanism for making that work: we could apply add_nulling_relids to the expression, if it meets those same conditions. This is a kluge really, but it would restore the status quo ante in a fairly localized fashion that seems like it might be safe enough to back-patch into v16. Here's a WIP patch that does it like that. One problem with it is that it requires rcon->relids to be calculated in cases where we didn't need that before, which is probably not *that* expensive but it's annoying. If we go forward with this, I'm thinking about changing add_nulling_relids' API contract to say "if target_relid is NULL then all level-zero Vars/PHVs are modified", so that we don't need that relid set in non-LATERAL cases. The other problem with this is that it breaks one test case in memoize.sql: a query that formerly generated a memoize plan now does not use memoize. I am not sure why not --- does that mean anything to you? regards, tom lane diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 969e257f70..3a12a52440 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -48,7 +48,7 @@ typedef struct pullup_replace_vars_context List *targetlist; /* tlist of subquery being pulled up */ RangeTblEntry *target_rte; /* RTE of subquery */ Relids relids; /* relids within subquery, as numbered after - * pullup (set only if target_rte->lateral) */ + * pullup */ bool *outer_hasSubLinks; /* -> outer query's hasSubLinks */ int varno; /* varno of subquery */ bool wrap_non_vars; /* do we need all non-Var outputs to be PHVs? */ @@ -1163,11 +1163,8 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, rvcontext.root = root; rvcontext.targetlist = subquery->targetList; rvcontext.target_rte = rte; - if (rte->lateral) - rvcontext.relids = get_relids_in_jointree((Node *) subquery->jointree, - true, true); - else /* won't need relids */ - rvcontext.relids = NULL; + rvcontext.relids = get_relids_in_jointree((Node *) subquery->jointree, + true, true); rvcontext.outer_hasSubLinks = &parse->hasSubLinks; rvcontext.varno = varno; /* this flag will be set below, if needed */ @@ -1713,7 +1710,7 @@ pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte) rvcontext.root = root; rvcontext.targetlist = tlist; rvcontext.target_rte = rte; - rvcontext.relids = NULL; + rvcontext.relids = NULL; /* XXX */ rvcontext.outer_hasSubLinks = &parse->hasSubLinks; rvcontext.varno = varno; rvcontext.wrap_non_vars = false; @@ -1877,7 +1874,7 @@ pull_up_constant_function(PlannerInfo *root, Node *jtnode, * lateral references, even if it's marked as LATERAL. This means we * don't need to fill relids. */ - rvcontext.relids = NULL; + rvcontext.relids = NULL; /* XXX */ rvcontext.outer_hasSubLinks = &parse->hasSubLinks; rvcontext.varno = ((RangeTblRef *) jtnode)->rtindex; @@ -2490,14 +2487,48 @@ pullup_replace_vars_callback(Var *var, else wrap = false; } + else if (rcon->wrap_non_vars) + { + /* Caller told us to wrap all non-Vars in a PlaceHolderVar */ + wrap = true; + } else { /* - * Must wrap, either because we need a place to insert - * varnullingrels or because caller told us to wrap - * everything. + * If the node contains Var(s) or PlaceHolderVar(s) of the + * subquery being pulled up, and does not contain any + * non-strict constructs, then instead of adding a PHV on top + * we can add the required nullingrels to those Vars/PHVs. + * (This is fundamentally a generalization of the above cases + * for bare Vars and PHVs.) + * + * This test is somewhat expensive, but it avoids pessimizing + * the plan in cases where the nullingrels get removed again + * later by outer join reduction. + * + * This analysis could be tighter: in particular, a non-strict + * construct hidden within a lower-level PlaceHolderVar is not + * reason to add another PHV. But for now it doesn't seem + * worth the code to be more exact. + * + * For a LATERAL subquery, we have to check the actual var + * membership of the node, but if it's non-lateral then any + * level-zero var must belong to the subquery. */ - wrap = true; + if ((rcon->target_rte->lateral ? + bms_overlap(pull_varnos(rcon->root, newnode), + rcon->relids) : + contain_vars_of_level(newnode, 0)) && + !contain_nonstrict_functions(newnode)) + { + /* No wrap needed */ + wrap = false; + } + else + { + /* Else wrap it in a PlaceHolderVar */ + wrap = true; + } } if (wrap) @@ -2522,7 +2553,7 @@ pullup_replace_vars_callback(Var *var, if (var->varlevelsup > 0) IncrementVarSublevelsUp(newnode, var->varlevelsup, 0); - /* Propagate any varnullingrels into the replacement Var or PHV */ + /* Propagate any varnullingrels into the replacement expression */ if (var->varnullingrels != NULL) { if (IsA(newnode, Var)) @@ -2542,7 +2573,15 @@ pullup_replace_vars_callback(Var *var, var->varnullingrels); } else - elog(ERROR, "failed to wrap a non-Var"); + { + /* There should be lower-level Vars/PHVs we can modify */ + newnode = add_nulling_relids(newnode, + rcon->relids, + var->varnullingrels); + /* Assert we did put the varnullingrels into the expression */ + Assert(bms_is_subset(var->varnullingrels, + pull_varnos(rcon->root, newnode))); + } } return newnode;
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
David Rowley
Date:
On Wed, 28 Aug 2024 at 09:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The other problem with this is that it breaks one test case in > memoize.sql: a query that formerly generated a memoize plan > now does not use memoize. I am not sure why not --- does that > mean anything to you? The reason it works in master is that get_memoize_path() calls extract_lateral_vars_from_PHVs() and finds PlaceHolderVars to use as the Memoize keys. With your patch PlannerInfo.placeholder_list is empty. The commit that made this work is 069d0ff02. Richard might be able to explain better. I don't quite understand why RelOptInfo.lateral_vars don't contain these in the first place. David
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 28 Aug 2024 at 09:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The other problem with this is that it breaks one test case in >> memoize.sql: a query that formerly generated a memoize plan >> now does not use memoize. I am not sure why not --- does that >> mean anything to you? > The reason it works in master is that get_memoize_path() calls > extract_lateral_vars_from_PHVs() and finds PlaceHolderVars to use as > the Memoize keys. With your patch PlannerInfo.placeholder_list is > empty. That seems like a pretty fishy way to do it. Are you saying that Memoize is never applicable if there aren't outer joins in the query? Without OJs there probably won't be any PHVs. regards, tom lane
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Tom Lane
Date:
I wrote: > That seems like a pretty fishy way to do it. Are you saying that > Memoize is never applicable if there aren't outer joins in the > query? Without OJs there probably won't be any PHVs. Oh, scratch that, I see you mean this is an additional way to do it not the only way to do it. But I'm confused why it works for t1.two+1 AS c1 but not t1.two+t2.two AS c1 Those ought to look pretty much the same for this purpose. regards, tom lane
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
David Rowley
Date:
On Wed, 28 Aug 2024 at 11:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Oh, scratch that, I see you mean this is an additional way to do it > not the only way to do it. But I'm confused why it works for > t1.two+1 AS c1 > but not > t1.two+t2.two AS c1 > Those ought to look pretty much the same for this purpose. The bms_overlap(pull_varnos(rcon->root, newnode), rcon->relids) test is false with t1.two+1. Looks like there needs to be a Var from t2 for the bms_overlap to be true David
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Richard Guo
Date:
On Wed, Aug 28, 2024 at 5:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I realized that actually we do have the mechanism for making that > work: we could apply add_nulling_relids to the expression, if it > meets those same conditions. I think this should work, as long as we apply add_nulling_relids only to Vars/PHVs that belong to the subquery in this case, because only those Vars/PHVs would be nulled by the outer joins contained in the nullingrels. > If we go forward with this, I'm thinking about > changing add_nulling_relids' API contract to say "if target_relid > is NULL then all level-zero Vars/PHVs are modified", so that we > don't need that relid set in non-LATERAL cases. +1. In LATERAL case, we can always find the subquery's relids in rcon->relids. In non-lateral case, any level-zero Vars/PHVs must belong to the subquery - so if we change add_nulling_relids' API to be so, we do not need to have rcon->relids set. Thanks Richard
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Richard Guo
Date:
On Wed, Aug 28, 2024 at 11:30 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Wed, Aug 28, 2024 at 5:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I realized that actually we do have the mechanism for making that > > work: we could apply add_nulling_relids to the expression, if it > > meets those same conditions. > > I think this should work, as long as we apply add_nulling_relids only > to Vars/PHVs that belong to the subquery in this case, because only > those Vars/PHVs would be nulled by the outer joins contained in the > nullingrels. To be more concrete, I know theoretically it is the whole expression that is nullable by the outer joins, not its individual vars. But in this case if the contained vars (that belong to the subquery) become NULL, the whole expression would be NULL too, because it does not contain any non-strict constructs. That's why I think this approach should work. Thanks Richard
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Richard Guo
Date:
On Wed, Aug 28, 2024 at 12:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > If we > are willing to accept a HEAD-only fix, it'd likely be better to > attack the other end and make it possible to remove no-op PHVs. > I think that'd require marking PHVs that need to be kept because > they are serving to isolate subexpressions. I think it's always desirable to remove no-op PHVs, even if we end up with a different approach to fix the issue discussed here. Doing that could potentially open up opportunities for optimization in other cases. For example: explain (costs off) select * from t t1 left join lateral (select t1.a as x, * from t t2) s on true where t1.a = s.a; QUERY PLAN ---------------------------- Nested Loop -> Seq Scan on t t1 -> Seq Scan on t t2 Filter: (t1.a = a) (4 rows) The target entry s.x is wrapped in a PHV that contains lateral reference to t1, which forces us to resort to nestloop join. However, since the left join has been reduced to an inner join, and it is removed from the PHV's nullingrels, leaving the nullingrels being empty, we should be able to remove this PHV and use merge or hash joins, depending on which is cheaper. I think there may be more cases where no-op PHVs constrain optimization opportunities. In [1] when working on the fix-grouping-sets patch, I included a mechanism in 0003 to remove no-op PHVs by including a flag in PlaceHolderVar to indicate whether it is safe to remove the PHV when its phnullingrels becomes empty. In that patch this flag is only set in cases where the PHV is used to carry the nullingrel bit that represents the grouping step. Maybe we can extend its use to remove all no-op PHVs, except those that are serving to isolate subexpressions. Any thoughts on this? [1] https://postgr.es/m/CAMbWs4_2t2pqqCFdS3NYJLwMMkAzYQKBOhKweFt-wE3YOi7rGg@mail.gmail.com Thanks Richard
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Richard Guo
Date:
On Wed, Aug 28, 2024 at 7:58 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, 28 Aug 2024 at 11:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Oh, scratch that, I see you mean this is an additional way to do it > > not the only way to do it. But I'm confused why it works for > > t1.two+1 AS c1 > > but not > > t1.two+t2.two AS c1 > > Those ought to look pretty much the same for this purpose. > > The bms_overlap(pull_varnos(rcon->root, newnode), rcon->relids) test > is false with t1.two+1. Looks like there needs to be a Var from t2 > for the bms_overlap to be true Exactly. What Tom's patch does is that if the expression contains Vars/PHVs that belong to the subquery, and does not contain any non-strict constructs, then it can escape being wrapped. In expression 't1.two+t2.two', 't2.two' is a Var that belongs to the subquery, and '+' is strict, so it can escape being wrapped. The expression 't1.two+1' does not meet these conditions, so it is wrapped into a PHV, and the PHV contains lateral reference to t1, which results in a nestloop join with a parameterized inner path. That's why Memoize can work in this query. Thanks Richard
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes: > Exactly. What Tom's patch does is that if the expression contains > Vars/PHVs that belong to the subquery, and does not contain any > non-strict constructs, then it can escape being wrapped. > In expression 't1.two+t2.two', 't2.two' is a Var that belongs to the > subquery, and '+' is strict, so it can escape being wrapped. > The expression 't1.two+1' does not meet these conditions, so it is > wrapped into a PHV, and the PHV contains lateral reference to t1, > which results in a nestloop join with a parameterized inner path. > That's why Memoize can work in this query. Yeah. (I'd missed that t1.two is a lateral reference and t2.two is not; sorry for the noise.) What happens as of HEAD is that, because we wrap this subquery output in a PHV marked as due to be evaluated at t2, the entire clause (t1.two+t2.two) = t2.unique1 becomes a base restriction clause for t2, so that when we generate a path for t2 it will include this as a path qual (forcing the path to be laterally dependent on t1). Without the PHV, it's just an ordinary join clause and it will not be evaluated at scan level unless it can be turned into an indexqual --- which it can't. The preceding regression-test case with "t1.two+1 = t2.unique1" can be made into a parameterized indexscan on t2.unique1, so it is, and then memoize can trigger off that. I'm inclined to think that treating such a clause as a join clause is strictly better than what happens now, so I'm not going to apologize for the PHV not being there. If you wanted to cast blame, you could look to set_plain_rel_pathlist, where it says * We don't support pushing join clauses into the quals of a seqscan, but * it could still have required parameterization due to LATERAL refs in * its tlist. (This comment could stand some work, as it fails to note that labeling the path with required parameterization can result in "join clauses" being evaluated there anyway.) In the normal course of things I'd be dubious about the value of pushing join clauses into a seqscan, but maybe the possibility of a memoize'd join has moved the goalposts enough that we should consider that. Alternatively, maybe get_memoized_path should take more responsibility for devising plausible subpaths rather than assuming they'll be handed to it on a platter. (I don't remember all the conditions checked in add_path, but I wonder if we are missing some potential memoize applications because suitable paths fail to survive the scan rel's add_path tournament.) In the meantime, I think this test case is mighty artificial, and it wouldn't bother me any to just take it out again for the time being. regards, tom lane
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Richard Guo
Date:
On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > In the meantime, I think this test case is mighty artificial, > and it wouldn't bother me any to just take it out again for the > time being. Yeah, I think we can remove the 't1.two+t2.two' test case if we go with your proposed patch to address this performance regression. Thanks Richard
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes: > On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In the meantime, I think this test case is mighty artificial, >> and it wouldn't bother me any to just take it out again for the >> time being. > Yeah, I think we can remove the 't1.two+t2.two' test case if we go > with your proposed patch to address this performance regression. Here's a polished-up patchset for that. I made the memoize test removal a separate patch because (a) it only applies to master and (b) it seems worth calling out as something we might be able to revert later. I found one bug in the draft patch: add_nulling_relids only processes Vars of level zero, so we have to apply it before not after adjusting the Vars' levelsup. An alternative could be to add a levelsup parameter to add_nulling_relids, but that seemed like unnecessary complication. regards, tom lane From 8fafdc4852b8f2164286e6863219eb6b4d267639 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu, 29 Aug 2024 16:25:23 -0400 Subject: [PATCH v1 1/2] Remove one memoize test case added by commit 069d0ff02. This test case turns out to depend on the assumption that a non-Var subquery output that's underneath an outer join will always get wrapped in a PlaceHolderVar. But that behavior causes performance regressions in some cases compared to what happened before v16. The next commit will avoid inserting a PHV in the same cases where pre-v16 did, and that causes get_memoized_path to not detect that a memoize plan could be used. Commit this separately, in hopes that we can restore the test after making get_memoized_path smarter. (It's failing to find memoize plans in adjacent cases where no PHV was ever inserted, so there is definitely room for improvement there.) Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com --- src/test/regress/expected/memoize.out | 30 --------------------------- src/test/regress/sql/memoize.sql | 11 ---------- 2 files changed, 41 deletions(-) diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out index df2ca5ba4e..9ee09fe2f5 100644 --- a/src/test/regress/expected/memoize.out +++ b/src/test/regress/expected/memoize.out @@ -160,36 +160,6 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000; 1000 | 9.5000000000000000 (1 row) --- Try with LATERAL references within PlaceHolderVars -SELECT explain_memoize(' -SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE -WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false); - explain_memoize --------------------------------------------------------------------------------------- - Aggregate (actual rows=1 loops=N) - -> Nested Loop (actual rows=1000 loops=N) - -> Seq Scan on tenk1 t1 (actual rows=1000 loops=N) - Filter: (unique1 < 1000) - Rows Removed by Filter: 9000 - -> Memoize (actual rows=1 loops=N) - Cache Key: t1.two - Cache Mode: binary - Hits: 998 Misses: 2 Evictions: Zero Overflows: 0 Memory Usage: NkB - -> Seq Scan on tenk1 t2 (actual rows=1 loops=N) - Filter: ((t1.two + two) = unique1) - Rows Removed by Filter: 9999 -(12 rows) - --- And check we get the expected results. -SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE -WHERE s.c1 = s.c2 AND t1.unique1 < 1000; - count | avg --------+-------------------- - 1000 | 9.0000000000000000 -(1 row) - -- Ensure we do not omit the cache keys from PlaceHolderVars SELECT explain_memoize(' SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN diff --git a/src/test/regress/sql/memoize.sql b/src/test/regress/sql/memoize.sql index 059bec5f4f..2eaeb1477a 100644 --- a/src/test/regress/sql/memoize.sql +++ b/src/test/regress/sql/memoize.sql @@ -85,17 +85,6 @@ SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN LATERAL (SELECT t1.two+1 AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE WHERE s.c1 = s.c2 AND t1.unique1 < 1000; --- Try with LATERAL references within PlaceHolderVars -SELECT explain_memoize(' -SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE -WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false); - --- And check we get the expected results. -SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE -WHERE s.c1 = s.c2 AND t1.unique1 < 1000; - -- Ensure we do not omit the cache keys from PlaceHolderVars SELECT explain_memoize(' SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -- 2.43.5 From 7b48394838797489e7ab869f97ca06449fdbcee3 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu, 29 Aug 2024 16:43:35 -0400 Subject: [PATCH v1 2/2] Avoid inserting PlaceHolderVars in cases where pre-v16 PG did not. Commit 2489d76c4 removed some logic from pullup_replace_vars() that avoided wrapping a PlaceHolderVar around a pulled-up subquery output expression if the expression could be proven to go to NULL anyway (because it contained Vars or PHVs of the pulled-up relation and did not contain non-strict constructs). But removing that logic turns out to cause performance regressions in some cases, because the extra PHV prevents subexpression folding, and will do so even if outer-join reduction later turns it into a no-op with no phnullingrels bits. The reason for always adding a PHV was to ensure we had someplace to put the varnullingrels marker bits of the Var being replaced. However, it turns out we can optimize in exactly the same cases that the previous code did, because we can attach the needed varnullingrels bits to the contained Var(s)/PHV(s). This is not a complete solution --- it would be even better if we could remove PHVs after reducing them to no-ops. It doesn't look practical to back-patch such an improvement, but this change seems safe and at least gets rid of the performance-regression cases. Per complaint from Nikhil Raj. Back-patch to v16 where the problem appeared. Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com --- src/backend/optimizer/prep/prepjointree.c | 66 ++++++++++++++++++----- src/backend/rewrite/rewriteManip.c | 9 ++-- src/test/regress/expected/subselect.out | 29 ++++++++++ src/test/regress/sql/subselect.sql | 18 +++++++ 4 files changed, 107 insertions(+), 15 deletions(-) diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 969e257f70..34fbf8ee23 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -2490,14 +2490,48 @@ pullup_replace_vars_callback(Var *var, else wrap = false; } + else if (rcon->wrap_non_vars) + { + /* Caller told us to wrap all non-Vars in a PlaceHolderVar */ + wrap = true; + } else { /* - * Must wrap, either because we need a place to insert - * varnullingrels or because caller told us to wrap - * everything. + * If the node contains Var(s) or PlaceHolderVar(s) of the + * subquery being pulled up, and does not contain any + * non-strict constructs, then instead of adding a PHV on top + * we can add the required nullingrels to those Vars/PHVs. + * (This is fundamentally a generalization of the above cases + * for bare Vars and PHVs.) + * + * This test is somewhat expensive, but it avoids pessimizing + * the plan in cases where the nullingrels get removed again + * later by outer join reduction. + * + * This analysis could be tighter: in particular, a non-strict + * construct hidden within a lower-level PlaceHolderVar is not + * reason to add another PHV. But for now it doesn't seem + * worth the code to be more exact. + * + * For a LATERAL subquery, we have to check the actual var + * membership of the node, but if it's non-lateral then any + * level-zero var must belong to the subquery. */ - wrap = true; + if ((rcon->target_rte->lateral ? + bms_overlap(pull_varnos(rcon->root, newnode), + rcon->relids) : + contain_vars_of_level(newnode, 0)) && + !contain_nonstrict_functions(newnode)) + { + /* No wrap needed */ + wrap = false; + } + else + { + /* Else wrap it in a PlaceHolderVar */ + wrap = true; + } } if (wrap) @@ -2518,18 +2552,14 @@ pullup_replace_vars_callback(Var *var, } } - /* Must adjust varlevelsup if replaced Var is within a subquery */ - if (var->varlevelsup > 0) - IncrementVarSublevelsUp(newnode, var->varlevelsup, 0); - - /* Propagate any varnullingrels into the replacement Var or PHV */ + /* Propagate any varnullingrels into the replacement expression */ if (var->varnullingrels != NULL) { if (IsA(newnode, Var)) { Var *newvar = (Var *) newnode; - Assert(newvar->varlevelsup == var->varlevelsup); + Assert(newvar->varlevelsup == 0); newvar->varnullingrels = bms_add_members(newvar->varnullingrels, var->varnullingrels); } @@ -2537,14 +2567,26 @@ pullup_replace_vars_callback(Var *var, { PlaceHolderVar *newphv = (PlaceHolderVar *) newnode; - Assert(newphv->phlevelsup == var->varlevelsup); + Assert(newphv->phlevelsup == 0); newphv->phnullingrels = bms_add_members(newphv->phnullingrels, var->varnullingrels); } else - elog(ERROR, "failed to wrap a non-Var"); + { + /* There should be lower-level Vars/PHVs we can modify */ + newnode = add_nulling_relids(newnode, + NULL, /* modify all Vars/PHVs */ + var->varnullingrels); + /* Assert we did put the varnullingrels into the expression */ + Assert(bms_is_subset(var->varnullingrels, + pull_varnos(rcon->root, newnode))); + } } + /* Must adjust varlevelsup if replaced Var is within a subquery */ + if (var->varlevelsup > 0) + IncrementVarSublevelsUp(newnode, var->varlevelsup, 0); + return newnode; } diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c index 191f2dc0b1..b20625fbd2 100644 --- a/src/backend/rewrite/rewriteManip.c +++ b/src/backend/rewrite/rewriteManip.c @@ -1141,7 +1141,8 @@ AddInvertedQual(Query *parsetree, Node *qual) /* * add_nulling_relids() finds Vars and PlaceHolderVars that belong to any * of the target_relids, and adds added_relids to their varnullingrels - * and phnullingrels fields. + * and phnullingrels fields. If target_relids is NULL, all level-zero + * Vars and PHVs are modified. */ Node * add_nulling_relids(Node *node, @@ -1170,7 +1171,8 @@ add_nulling_relids_mutator(Node *node, Var *var = (Var *) node; if (var->varlevelsup == context->sublevels_up && - bms_is_member(var->varno, context->target_relids)) + (context->target_relids == NULL || + bms_is_member(var->varno, context->target_relids))) { Relids newnullingrels = bms_union(var->varnullingrels, context->added_relids); @@ -1188,7 +1190,8 @@ add_nulling_relids_mutator(Node *node, PlaceHolderVar *phv = (PlaceHolderVar *) node; if (phv->phlevelsup == context->sublevels_up && - bms_overlap(phv->phrels, context->target_relids)) + (context->target_relids == NULL || + bms_overlap(phv->phrels, context->target_relids))) { Relids newnullingrels = bms_union(phv->phnullingrels, context->added_relids); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 9eecdc1e92..2d35de3fad 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -1721,6 +1721,35 @@ fetch backward all in c1; (2 rows) commit; +-- +-- Verify that we correctly flatten cases involving a subquery output +-- expression that doesn't need to be wrapped in a PlaceHolderVar +-- +explain (costs off) +select tname, attname from ( +select relname::information_schema.sql_identifier as tname, * from + (select * from pg_class c) ss1) ss2 + right join pg_attribute a on a.attrelid = ss2.oid +where tname = 'tenk1' and attnum = 1; + QUERY PLAN +-------------------------------------------------------------------------- + Nested Loop + -> Index Scan using pg_class_relname_nsp_index on pg_class c + Index Cond: (relname = 'tenk1'::name) + -> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a + Index Cond: ((attrelid = c.oid) AND (attnum = 1)) +(5 rows) + +select tname, attname from ( +select relname::information_schema.sql_identifier as tname, * from + (select * from pg_class c) ss1) ss2 + right join pg_attribute a on a.attrelid = ss2.oid +where tname = 'tenk1' and attnum = 1; + tname | attname +-------+--------- + tenk1 | unique1 +(1 row) + -- -- Tests for CTE inlining behavior -- diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index 75a9b718b2..af6e157aca 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -890,6 +890,24 @@ fetch backward all in c1; commit; +-- +-- Verify that we correctly flatten cases involving a subquery output +-- expression that doesn't need to be wrapped in a PlaceHolderVar +-- + +explain (costs off) +select tname, attname from ( +select relname::information_schema.sql_identifier as tname, * from + (select * from pg_class c) ss1) ss2 + right join pg_attribute a on a.attrelid = ss2.oid +where tname = 'tenk1' and attnum = 1; + +select tname, attname from ( +select relname::information_schema.sql_identifier as tname, * from + (select * from pg_class c) ss1) ss2 + right join pg_attribute a on a.attrelid = ss2.oid +where tname = 'tenk1' and attnum = 1; + -- -- Tests for CTE inlining behavior -- -- 2.43.5
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
From
nikhil raj
Date:
Hi All,
I hope you're doing well.
I'm writing to kindly requesting if there is a bug tracker ID or any reference number associated with this issue, I would appreciate it if you could share it with me.
Thank you for your time and assistance. Please let me know if there's any additional information you need from me.
Best regards,
Nikhil
On Fri, 30 Aug, 2024, 2:23 am Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In the meantime, I think this test case is mighty artificial,
>> and it wouldn't bother me any to just take it out again for the
>> time being.
> Yeah, I think we can remove the 't1.two+t2.two' test case if we go
> with your proposed patch to address this performance regression.
Here's a polished-up patchset for that. I made the memoize test
removal a separate patch because (a) it only applies to master
and (b) it seems worth calling out as something we might be able
to revert later.
I found one bug in the draft patch: add_nulling_relids only processes
Vars of level zero, so we have to apply it before not after adjusting
the Vars' levelsup. An alternative could be to add a levelsup
parameter to add_nulling_relids, but that seemed like unnecessary
complication.
regards, tom lane