Thread: Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Tom Lane

Date:

27 August 2024, 19:15:41

[ switching to -hackers list ]

David Rowley <dgrowleyml@gmail.com> writes:
> In case it saves you a bit of time, I stripped as much of the
> unrelated stuff out as I could and got:

> create table t (a name, b int);
> explain select * from (select a::varchar,b from (select distinct a,b
> from t) st) t right join t t2 on t.b=t2.b where t.a='test';

> getting rid of the cast or swapping to INNER JOIN rather than RIGHT
> JOIN means that qual_is_pushdown_safe() gets a Var rather than a
> PlaceHolderVar.

Thanks.  So it seems that what's happening is that we stick a
PlaceHolderVar on the intermediate subquery's output ("a::varchar"),
and then later when we realize that the RIGHT JOIN can be reduced to
an inner join we run around and remove the right join from the
PlaceHolderVar's nullingrels, leaving a useless PHV with no
nullingrels.  remove_nulling_relids explains

             * Note: it might seem desirable to remove the PHV altogether if
             * phnullingrels goes to empty.  Currently we dare not do that
             * because we use PHVs in some cases to enforce separate identity
             * of subexpressions; see wrap_non_vars usages in prepjointree.c.

However, then when we consider whether the upper WHERE condition
can be pushed down into the unflattened lower subquery,
qual_is_pushdown_safe punts:

         * XXX Punt if we find any PlaceHolderVars in the restriction clause.
         * It's not clear whether a PHV could safely be pushed down, and even
         * less clear whether such a situation could arise in any cases of
         * practical interest anyway.  So for the moment, just refuse to push
         * down.

We didn't see this particular behavior before 2489d76c49 because
pullup_replace_vars avoided inserting a PHV:

                 * If it contains a Var of the subquery being pulled up, and
                 * does not contain any non-strict constructs, then it's
                 * certainly nullable so we don't need to insert a
                 * PlaceHolderVar.

I dropped that case in 2489d76c49 because now we need to attach
nullingrels to the expression.  You could imagine attaching the
nullingrels to the contained Var(s) instead of putting a PHV on top,
but that seems like a mess and I'm not quite sure it's semantically
the same.  In any case it wouldn't fix adjacent cases where there is
a non-strict construct in the subquery output expression.

So it seems like we need to fix one or the other of these
implementation shortcuts to restore the previous behavior.
I'm wondering if it'd be okay for qual_is_pushdown_safe to accept
PHVs that have no nullingrels.  I'm not really thrilled about trying
to back-patch any such fix though --- the odds of introducing new bugs
seem nontrivial, and the problem case seems rather narrow.  If we
are willing to accept a HEAD-only fix, it'd likely be better to
attack the other end and make it possible to remove no-op PHVs.
I think that'd require marking PHVs that need to be kept because
they are serving to isolate subexpressions.

            regards, tom lane

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Tom Lane

Date:

28 August 2024, 00:52:12

I wrote:
> We didn't see this particular behavior before 2489d76c49 because
> pullup_replace_vars avoided inserting a PHV:
>                  * If it contains a Var of the subquery being pulled up, and
>                  * does not contain any non-strict constructs, then it's
>                  * certainly nullable so we don't need to insert a
>                  * PlaceHolderVar.
> I dropped that case in 2489d76c49 because now we need to attach
> nullingrels to the expression.  You could imagine attaching the
> nullingrels to the contained Var(s) instead of putting a PHV on top,
> but that seems like a mess and I'm not quite sure it's semantically
> the same.  In any case it wouldn't fix adjacent cases where there is
> a non-strict construct in the subquery output expression.

I realized that actually we do have the mechanism for making that
work: we could apply add_nulling_relids to the expression, if it
meets those same conditions.  This is a kluge really, but it would
restore the status quo ante in a fairly localized fashion that
seems like it might be safe enough to back-patch into v16.

Here's a WIP patch that does it like that.  One problem with it
is that it requires rcon->relids to be calculated in cases where
we didn't need that before, which is probably not *that* expensive
but it's annoying.  If we go forward with this, I'm thinking about
changing add_nulling_relids' API contract to say "if target_relid
is NULL then all level-zero Vars/PHVs are modified", so that we
don't need that relid set in non-LATERAL cases.

The other problem with this is that it breaks one test case in
memoize.sql: a query that formerly generated a memoize plan
now does not use memoize.  I am not sure why not --- does that
mean anything to you?

            regards, tom lane

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 969e257f70..3a12a52440 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -48,7 +48,7 @@ typedef struct pullup_replace_vars_context
     List       *targetlist;        /* tlist of subquery being pulled up */
     RangeTblEntry *target_rte;    /* RTE of subquery */
     Relids        relids;            /* relids within subquery, as numbered after
-                                 * pullup (set only if target_rte->lateral) */
+                                 * pullup */
     bool       *outer_hasSubLinks;    /* -> outer query's hasSubLinks */
     int            varno;            /* varno of subquery */
     bool        wrap_non_vars;    /* do we need all non-Var outputs to be PHVs? */
@@ -1163,11 +1163,8 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     rvcontext.root = root;
     rvcontext.targetlist = subquery->targetList;
     rvcontext.target_rte = rte;
-    if (rte->lateral)
-        rvcontext.relids = get_relids_in_jointree((Node *) subquery->jointree,
-                                                  true, true);
-    else                        /* won't need relids */
-        rvcontext.relids = NULL;
+    rvcontext.relids = get_relids_in_jointree((Node *) subquery->jointree,
+                                              true, true);
     rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
     rvcontext.varno = varno;
     /* this flag will be set below, if needed */
@@ -1713,7 +1710,7 @@ pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte)
     rvcontext.root = root;
     rvcontext.targetlist = tlist;
     rvcontext.target_rte = rte;
-    rvcontext.relids = NULL;
+    rvcontext.relids = NULL;    /* XXX */
     rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
     rvcontext.varno = varno;
     rvcontext.wrap_non_vars = false;
@@ -1877,7 +1874,7 @@ pull_up_constant_function(PlannerInfo *root, Node *jtnode,
      * lateral references, even if it's marked as LATERAL.  This means we
      * don't need to fill relids.
      */
-    rvcontext.relids = NULL;
+    rvcontext.relids = NULL;    /* XXX */

     rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
     rvcontext.varno = ((RangeTblRef *) jtnode)->rtindex;
@@ -2490,14 +2487,48 @@ pullup_replace_vars_callback(Var *var,
                 else
                     wrap = false;
             }
+            else if (rcon->wrap_non_vars)
+            {
+                /* Caller told us to wrap all non-Vars in a PlaceHolderVar */
+                wrap = true;
+            }
             else
             {
                 /*
-                 * Must wrap, either because we need a place to insert
-                 * varnullingrels or because caller told us to wrap
-                 * everything.
+                 * If the node contains Var(s) or PlaceHolderVar(s) of the
+                 * subquery being pulled up, and does not contain any
+                 * non-strict constructs, then instead of adding a PHV on top
+                 * we can add the required nullingrels to those Vars/PHVs.
+                 * (This is fundamentally a generalization of the above cases
+                 * for bare Vars and PHVs.)
+                 *
+                 * This test is somewhat expensive, but it avoids pessimizing
+                 * the plan in cases where the nullingrels get removed again
+                 * later by outer join reduction.
+                 *
+                 * This analysis could be tighter: in particular, a non-strict
+                 * construct hidden within a lower-level PlaceHolderVar is not
+                 * reason to add another PHV.  But for now it doesn't seem
+                 * worth the code to be more exact.
+                 *
+                 * For a LATERAL subquery, we have to check the actual var
+                 * membership of the node, but if it's non-lateral then any
+                 * level-zero var must belong to the subquery.
                  */
-                wrap = true;
+                if ((rcon->target_rte->lateral ?
+                     bms_overlap(pull_varnos(rcon->root, newnode),
+                                 rcon->relids) :
+                     contain_vars_of_level(newnode, 0)) &&
+                    !contain_nonstrict_functions(newnode))
+                {
+                    /* No wrap needed */
+                    wrap = false;
+                }
+                else
+                {
+                    /* Else wrap it in a PlaceHolderVar */
+                    wrap = true;
+                }
             }

             if (wrap)
@@ -2522,7 +2553,7 @@ pullup_replace_vars_callback(Var *var,
     if (var->varlevelsup > 0)
         IncrementVarSublevelsUp(newnode, var->varlevelsup, 0);

-    /* Propagate any varnullingrels into the replacement Var or PHV */
+    /* Propagate any varnullingrels into the replacement expression */
     if (var->varnullingrels != NULL)
     {
         if (IsA(newnode, Var))
@@ -2542,7 +2573,15 @@ pullup_replace_vars_callback(Var *var,
                                                     var->varnullingrels);
         }
         else
-            elog(ERROR, "failed to wrap a non-Var");
+        {
+            /* There should be lower-level Vars/PHVs we can modify */
+            newnode = add_nulling_relids(newnode,
+                                         rcon->relids,
+                                         var->varnullingrels);
+            /* Assert we did put the varnullingrels into the expression */
+            Assert(bms_is_subset(var->varnullingrels,
+                                 pull_varnos(rcon->root, newnode)));
+        }
     }

     return newnode;

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

David Rowley

Date:

28 August 2024, 02:03:08

On Wed, 28 Aug 2024 at 09:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The other problem with this is that it breaks one test case in
> memoize.sql: a query that formerly generated a memoize plan
> now does not use memoize.  I am not sure why not --- does that
> mean anything to you?

The reason it works in master is that get_memoize_path() calls
extract_lateral_vars_from_PHVs() and finds PlaceHolderVars to use as
the Memoize keys. With your patch PlannerInfo.placeholder_list is
empty.

The commit that made this work is 069d0ff02. Richard might be able to
explain better. I don't quite understand why RelOptInfo.lateral_vars
don't contain these in the first place.

David

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Tom Lane

Date:

28 August 2024, 02:15:21

David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 28 Aug 2024 at 09:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The other problem with this is that it breaks one test case in
>> memoize.sql: a query that formerly generated a memoize plan
>> now does not use memoize.  I am not sure why not --- does that
>> mean anything to you?

> The reason it works in master is that get_memoize_path() calls
> extract_lateral_vars_from_PHVs() and finds PlaceHolderVars to use as
> the Memoize keys. With your patch PlannerInfo.placeholder_list is
> empty.

That seems like a pretty fishy way to do it.  Are you saying that
Memoize is never applicable if there aren't outer joins in the
query?  Without OJs there probably won't be any PHVs.

            regards, tom lane

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Tom Lane

Date:

28 August 2024, 02:37:56

I wrote:
> That seems like a pretty fishy way to do it.  Are you saying that
> Memoize is never applicable if there aren't outer joins in the
> query?  Without OJs there probably won't be any PHVs.

Oh, scratch that, I see you mean this is an additional way to do it
not the only way to do it.  But I'm confused why it works for
    t1.two+1 AS c1
but not
    t1.two+t2.two AS c1
Those ought to look pretty much the same for this purpose.

            regards, tom lane

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

David Rowley

Date:

28 August 2024, 02:57:53

On Wed, 28 Aug 2024 at 11:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Oh, scratch that, I see you mean this is an additional way to do it
> not the only way to do it.  But I'm confused why it works for
>         t1.two+1 AS c1
> but not
>         t1.two+t2.two AS c1
> Those ought to look pretty much the same for this purpose.

The bms_overlap(pull_varnos(rcon->root, newnode), rcon->relids) test
is false with t1.two+1.  Looks like there needs to be a Var from t2
for the bms_overlap to be true

David

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Richard Guo

Date:

28 August 2024, 06:30:45

On Wed, Aug 28, 2024 at 5:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I realized that actually we do have the mechanism for making that
> work: we could apply add_nulling_relids to the expression, if it
> meets those same conditions.

I think this should work, as long as we apply add_nulling_relids only
to Vars/PHVs that belong to the subquery in this case, because only
those Vars/PHVs would be nulled by the outer joins contained in the
nullingrels.

> If we go forward with this, I'm thinking about
> changing add_nulling_relids' API contract to say "if target_relid
> is NULL then all level-zero Vars/PHVs are modified", so that we
> don't need that relid set in non-LATERAL cases.

+1.  In LATERAL case, we can always find the subquery's relids in
rcon->relids.  In non-lateral case, any level-zero Vars/PHVs must
belong to the subquery - so if we change add_nulling_relids' API to be
so, we do not need to have rcon->relids set.

Thanks
Richard

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Richard Guo

Date:

28 August 2024, 06:52:46

On Wed, Aug 28, 2024 at 11:30 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Wed, Aug 28, 2024 at 5:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I realized that actually we do have the mechanism for making that
> > work: we could apply add_nulling_relids to the expression, if it
> > meets those same conditions.
>
> I think this should work, as long as we apply add_nulling_relids only
> to Vars/PHVs that belong to the subquery in this case, because only
> those Vars/PHVs would be nulled by the outer joins contained in the
> nullingrels.

To be more concrete, I know theoretically it is the whole expression
that is nullable by the outer joins, not its individual vars.  But in
this case if the contained vars (that belong to the subquery) become
NULL, the whole expression would be NULL too, because it does not
contain any non-strict constructs.  That's why I think this approach
should work.

Thanks
Richard

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Richard Guo

Date:

28 August 2024, 10:08:16

On Wed, Aug 28, 2024 at 12:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If we
> are willing to accept a HEAD-only fix, it'd likely be better to
> attack the other end and make it possible to remove no-op PHVs.
> I think that'd require marking PHVs that need to be kept because
> they are serving to isolate subexpressions.

I think it's always desirable to remove no-op PHVs, even if we end up
with a different approach to fix the issue discussed here.  Doing that
could potentially open up opportunities for optimization in other
cases.  For example:

explain (costs off)
select * from t t1 left join
    lateral (select t1.a as x, * from t t2) s on true
where t1.a = s.a;
         QUERY PLAN
----------------------------
 Nested Loop
   ->  Seq Scan on t t1
   ->  Seq Scan on t t2
         Filter: (t1.a = a)
(4 rows)

The target entry s.x is wrapped in a PHV that contains lateral
reference to t1, which forces us to resort to nestloop join.  However,
since the left join has been reduced to an inner join, and it is
removed from the PHV's nullingrels, leaving the nullingrels being
empty, we should be able to remove this PHV and use merge or hash
joins, depending on which is cheaper.

I think there may be more cases where no-op PHVs constrain
optimization opportunities.

In [1] when working on the fix-grouping-sets patch, I included a
mechanism in 0003 to remove no-op PHVs by including a flag in
PlaceHolderVar to indicate whether it is safe to remove the PHV when
its phnullingrels becomes empty.  In that patch this flag is only set
in cases where the PHV is used to carry the nullingrel bit that
represents the grouping step.  Maybe we can extend its use to remove
all no-op PHVs, except those that are serving to isolate
subexpressions.

Any thoughts on this?

[1] https://postgr.es/m/CAMbWs4_2t2pqqCFdS3NYJLwMMkAzYQKBOhKweFt-wE3YOi7rGg@mail.gmail.com

Thanks
Richard

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Richard Guo

Date:

28 August 2024, 10:31:27

On Wed, Aug 28, 2024 at 7:58 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 28 Aug 2024 at 11:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Oh, scratch that, I see you mean this is an additional way to do it
> > not the only way to do it.  But I'm confused why it works for
> >         t1.two+1 AS c1
> > but not
> >         t1.two+t2.two AS c1
> > Those ought to look pretty much the same for this purpose.
>
> The bms_overlap(pull_varnos(rcon->root, newnode), rcon->relids) test
> is false with t1.two+1.  Looks like there needs to be a Var from t2
> for the bms_overlap to be true

Exactly.  What Tom's patch does is that if the expression contains
Vars/PHVs that belong to the subquery, and does not contain any
non-strict constructs, then it can escape being wrapped.

In expression 't1.two+t2.two', 't2.two' is a Var that belongs to the
subquery, and '+' is strict, so it can escape being wrapped.

The expression 't1.two+1' does not meet these conditions, so it is
wrapped into a PHV, and the PHV contains lateral reference to t1,
which results in a nestloop join with a parameterized inner path.
That's why Memoize can work in this query.

Thanks
Richard

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Tom Lane

Date:

28 August 2024, 23:47:47

Richard Guo <guofenglinux@gmail.com> writes:
> Exactly.  What Tom's patch does is that if the expression contains
> Vars/PHVs that belong to the subquery, and does not contain any
> non-strict constructs, then it can escape being wrapped.

> In expression 't1.two+t2.two', 't2.two' is a Var that belongs to the
> subquery, and '+' is strict, so it can escape being wrapped.

> The expression 't1.two+1' does not meet these conditions, so it is
> wrapped into a PHV, and the PHV contains lateral reference to t1,
> which results in a nestloop join with a parameterized inner path.
> That's why Memoize can work in this query.

Yeah.  (I'd missed that t1.two is a lateral reference and t2.two is
not; sorry for the noise.)

What happens as of HEAD is that, because we wrap this subquery output
in a PHV marked as due to be evaluated at t2, the entire clause

    (t1.two+t2.two) = t2.unique1

becomes a base restriction clause for t2, so that when we generate
a path for t2 it will include this as a path qual (forcing the path
to be laterally dependent on t1).  Without the PHV, it's just an
ordinary join clause and it will not be evaluated at scan level
unless it can be turned into an indexqual --- which it can't.

The preceding regression-test case with "t1.two+1 = t2.unique1"
can be made into a parameterized indexscan on t2.unique1, so it is,
and then memoize can trigger off that.

I'm inclined to think that treating such a clause as a join clause
is strictly better than what happens now, so I'm not going to
apologize for the PHV not being there.  If you wanted to cast
blame, you could look to set_plain_rel_pathlist, where it says

     * We don't support pushing join clauses into the quals of a seqscan, but
     * it could still have required parameterization due to LATERAL refs in
     * its tlist.

(This comment could stand some work, as it fails to note that
labeling the path with required parameterization can result in
"join clauses" being evaluated there anyway.)

In the normal course of things I'd be dubious about the value of
pushing join clauses into a seqscan, but maybe the possibility of a
memoize'd join has moved the goalposts enough that we should
consider that.  Alternatively, maybe get_memoized_path should take
more responsibility for devising plausible subpaths rather than
assuming they'll be handed to it on a platter.  (I don't remember
all the conditions checked in add_path, but I wonder if we are
missing some potential memoize applications because suitable paths
fail to survive the scan rel's add_path tournament.)

In the meantime, I think this test case is mighty artificial,
and it wouldn't bother me any to just take it out again for the
time being.

            regards, tom lane

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Richard Guo

Date:

29 August 2024, 04:54:57

On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In the meantime, I think this test case is mighty artificial,
> and it wouldn't bother me any to just take it out again for the
> time being.

Yeah, I think we can remove the 't1.two+t2.two' test case if we go
with your proposed patch to address this performance regression.

Thanks
Richard

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

Tom Lane

Date:

29 August 2024, 23:53:03

Richard Guo <guofenglinux@gmail.com> writes:
> On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In the meantime, I think this test case is mighty artificial,
>> and it wouldn't bother me any to just take it out again for the
>> time being.

> Yeah, I think we can remove the 't1.two+t2.two' test case if we go
> with your proposed patch to address this performance regression.

Here's a polished-up patchset for that.  I made the memoize test
removal a separate patch because (a) it only applies to master
and (b) it seems worth calling out as something we might be able
to revert later.

I found one bug in the draft patch: add_nulling_relids only processes
Vars of level zero, so we have to apply it before not after adjusting
the Vars' levelsup.  An alternative could be to add a levelsup
parameter to add_nulling_relids, but that seemed like unnecessary
complication.

            regards, tom lane

From 8fafdc4852b8f2164286e6863219eb6b4d267639 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 29 Aug 2024 16:25:23 -0400
Subject: [PATCH v1 1/2] Remove one memoize test case added by commit
 069d0ff02.

This test case turns out to depend on the assumption that a non-Var
subquery output that's underneath an outer join will always get
wrapped in a PlaceHolderVar.  But that behavior causes performance
regressions in some cases compared to what happened before v16.
The next commit will avoid inserting a PHV in the same cases where
pre-v16 did, and that causes get_memoized_path to not detect that
a memoize plan could be used.

Commit this separately, in hopes that we can restore the test after
making get_memoized_path smarter.  (It's failing to find memoize
plans in adjacent cases where no PHV was ever inserted, so there
is definitely room for improvement there.)

Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com
---
 src/test/regress/expected/memoize.out | 30 ---------------------------
 src/test/regress/sql/memoize.sql      | 11 ----------
 2 files changed, 41 deletions(-)

diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out
index df2ca5ba4e..9ee09fe2f5 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -160,36 +160,6 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
   1000 | 9.5000000000000000
 (1 row)

--- Try with LATERAL references within PlaceHolderVars
-SELECT explain_memoize('
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
-                                   explain_memoize
---------------------------------------------------------------------------------------
- Aggregate (actual rows=1 loops=N)
-   ->  Nested Loop (actual rows=1000 loops=N)
-         ->  Seq Scan on tenk1 t1 (actual rows=1000 loops=N)
-               Filter: (unique1 < 1000)
-               Rows Removed by Filter: 9000
-         ->  Memoize (actual rows=1 loops=N)
-               Cache Key: t1.two
-               Cache Mode: binary
-               Hits: 998  Misses: 2  Evictions: Zero  Overflows: 0  Memory Usage: NkB
-               ->  Seq Scan on tenk1 t2 (actual rows=1 loops=N)
-                     Filter: ((t1.two + two) = unique1)
-                     Rows Removed by Filter: 9999
-(12 rows)
-
--- And check we get the expected results.
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
- count |        avg
--------+--------------------
-  1000 | 9.0000000000000000
-(1 row)
-
 -- Ensure we do not omit the cache keys from PlaceHolderVars
 SELECT explain_memoize('
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
diff --git a/src/test/regress/sql/memoize.sql b/src/test/regress/sql/memoize.sql
index 059bec5f4f..2eaeb1477a 100644
--- a/src/test/regress/sql/memoize.sql
+++ b/src/test/regress/sql/memoize.sql
@@ -85,17 +85,6 @@ SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
 LATERAL (SELECT t1.two+1 AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
 WHERE s.c1 = s.c2 AND t1.unique1 < 1000;

--- Try with LATERAL references within PlaceHolderVars
-SELECT explain_memoize('
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false);
-
--- And check we get the expected results.
-SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
-LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE
-WHERE s.c1 = s.c2 AND t1.unique1 < 1000;
-
 -- Ensure we do not omit the cache keys from PlaceHolderVars
 SELECT explain_memoize('
 SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN
--
2.43.5

From 7b48394838797489e7ab869f97ca06449fdbcee3 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 29 Aug 2024 16:43:35 -0400
Subject: [PATCH v1 2/2] Avoid inserting PlaceHolderVars in cases where pre-v16
 PG did not.

Commit 2489d76c4 removed some logic from pullup_replace_vars()
that avoided wrapping a PlaceHolderVar around a pulled-up
subquery output expression if the expression could be proven
to go to NULL anyway (because it contained Vars or PHVs of the
pulled-up relation and did not contain non-strict constructs).
But removing that logic turns out to cause performance regressions
in some cases, because the extra PHV prevents subexpression folding,
and will do so even if outer-join reduction later turns it into a
no-op with no phnullingrels bits.

The reason for always adding a PHV was to ensure we had someplace
to put the varnullingrels marker bits of the Var being replaced.
However, it turns out we can optimize in exactly the same cases that
the previous code did, because we can attach the needed varnullingrels
bits to the contained Var(s)/PHV(s).

This is not a complete solution --- it would be even better if we
could remove PHVs after reducing them to no-ops.  It doesn't look
practical to back-patch such an improvement, but this change seems
safe and at least gets rid of the performance-regression cases.

Per complaint from Nikhil Raj.  Back-patch to v16 where the
problem appeared.

Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com
---
 src/backend/optimizer/prep/prepjointree.c | 66 ++++++++++++++++++-----
 src/backend/rewrite/rewriteManip.c        |  9 ++--
 src/test/regress/expected/subselect.out   | 29 ++++++++++
 src/test/regress/sql/subselect.sql        | 18 +++++++
 4 files changed, 107 insertions(+), 15 deletions(-)

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 969e257f70..34fbf8ee23 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -2490,14 +2490,48 @@ pullup_replace_vars_callback(Var *var,
                 else
                     wrap = false;
             }
+            else if (rcon->wrap_non_vars)
+            {
+                /* Caller told us to wrap all non-Vars in a PlaceHolderVar */
+                wrap = true;
+            }
             else
             {
                 /*
-                 * Must wrap, either because we need a place to insert
-                 * varnullingrels or because caller told us to wrap
-                 * everything.
+                 * If the node contains Var(s) or PlaceHolderVar(s) of the
+                 * subquery being pulled up, and does not contain any
+                 * non-strict constructs, then instead of adding a PHV on top
+                 * we can add the required nullingrels to those Vars/PHVs.
+                 * (This is fundamentally a generalization of the above cases
+                 * for bare Vars and PHVs.)
+                 *
+                 * This test is somewhat expensive, but it avoids pessimizing
+                 * the plan in cases where the nullingrels get removed again
+                 * later by outer join reduction.
+                 *
+                 * This analysis could be tighter: in particular, a non-strict
+                 * construct hidden within a lower-level PlaceHolderVar is not
+                 * reason to add another PHV.  But for now it doesn't seem
+                 * worth the code to be more exact.
+                 *
+                 * For a LATERAL subquery, we have to check the actual var
+                 * membership of the node, but if it's non-lateral then any
+                 * level-zero var must belong to the subquery.
                  */
-                wrap = true;
+                if ((rcon->target_rte->lateral ?
+                     bms_overlap(pull_varnos(rcon->root, newnode),
+                                 rcon->relids) :
+                     contain_vars_of_level(newnode, 0)) &&
+                    !contain_nonstrict_functions(newnode))
+                {
+                    /* No wrap needed */
+                    wrap = false;
+                }
+                else
+                {
+                    /* Else wrap it in a PlaceHolderVar */
+                    wrap = true;
+                }
             }

             if (wrap)
@@ -2518,18 +2552,14 @@ pullup_replace_vars_callback(Var *var,
         }
     }

-    /* Must adjust varlevelsup if replaced Var is within a subquery */
-    if (var->varlevelsup > 0)
-        IncrementVarSublevelsUp(newnode, var->varlevelsup, 0);
-
-    /* Propagate any varnullingrels into the replacement Var or PHV */
+    /* Propagate any varnullingrels into the replacement expression */
     if (var->varnullingrels != NULL)
     {
         if (IsA(newnode, Var))
         {
             Var           *newvar = (Var *) newnode;

-            Assert(newvar->varlevelsup == var->varlevelsup);
+            Assert(newvar->varlevelsup == 0);
             newvar->varnullingrels = bms_add_members(newvar->varnullingrels,
                                                      var->varnullingrels);
         }
@@ -2537,14 +2567,26 @@ pullup_replace_vars_callback(Var *var,
         {
             PlaceHolderVar *newphv = (PlaceHolderVar *) newnode;

-            Assert(newphv->phlevelsup == var->varlevelsup);
+            Assert(newphv->phlevelsup == 0);
             newphv->phnullingrels = bms_add_members(newphv->phnullingrels,
                                                     var->varnullingrels);
         }
         else
-            elog(ERROR, "failed to wrap a non-Var");
+        {
+            /* There should be lower-level Vars/PHVs we can modify */
+            newnode = add_nulling_relids(newnode,
+                                         NULL,    /* modify all Vars/PHVs */
+                                         var->varnullingrels);
+            /* Assert we did put the varnullingrels into the expression */
+            Assert(bms_is_subset(var->varnullingrels,
+                                 pull_varnos(rcon->root, newnode)));
+        }
     }

+    /* Must adjust varlevelsup if replaced Var is within a subquery */
+    if (var->varlevelsup > 0)
+        IncrementVarSublevelsUp(newnode, var->varlevelsup, 0);
+
     return newnode;
 }

diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 191f2dc0b1..b20625fbd2 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -1141,7 +1141,8 @@ AddInvertedQual(Query *parsetree, Node *qual)
 /*
  * add_nulling_relids() finds Vars and PlaceHolderVars that belong to any
  * of the target_relids, and adds added_relids to their varnullingrels
- * and phnullingrels fields.
+ * and phnullingrels fields.  If target_relids is NULL, all level-zero
+ * Vars and PHVs are modified.
  */
 Node *
 add_nulling_relids(Node *node,
@@ -1170,7 +1171,8 @@ add_nulling_relids_mutator(Node *node,
         Var           *var = (Var *) node;

         if (var->varlevelsup == context->sublevels_up &&
-            bms_is_member(var->varno, context->target_relids))
+            (context->target_relids == NULL ||
+             bms_is_member(var->varno, context->target_relids)))
         {
             Relids        newnullingrels = bms_union(var->varnullingrels,
                                                    context->added_relids);
@@ -1188,7 +1190,8 @@ add_nulling_relids_mutator(Node *node,
         PlaceHolderVar *phv = (PlaceHolderVar *) node;

         if (phv->phlevelsup == context->sublevels_up &&
-            bms_overlap(phv->phrels, context->target_relids))
+            (context->target_relids == NULL ||
+             bms_overlap(phv->phrels, context->target_relids)))
         {
             Relids        newnullingrels = bms_union(phv->phnullingrels,
                                                    context->added_relids);
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 9eecdc1e92..2d35de3fad 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1721,6 +1721,35 @@ fetch backward all in c1;
 (2 rows)

 commit;
+--
+-- Verify that we correctly flatten cases involving a subquery output
+-- expression that doesn't need to be wrapped in a PlaceHolderVar
+--
+explain (costs off)
+select tname, attname from (
+select relname::information_schema.sql_identifier as tname, * from
+  (select * from pg_class c) ss1) ss2
+  right join pg_attribute a on a.attrelid = ss2.oid
+where tname = 'tenk1' and attnum = 1;
+                                QUERY PLAN
+--------------------------------------------------------------------------
+ Nested Loop
+   ->  Index Scan using pg_class_relname_nsp_index on pg_class c
+         Index Cond: (relname = 'tenk1'::name)
+   ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a
+         Index Cond: ((attrelid = c.oid) AND (attnum = 1))
+(5 rows)
+
+select tname, attname from (
+select relname::information_schema.sql_identifier as tname, * from
+  (select * from pg_class c) ss1) ss2
+  right join pg_attribute a on a.attrelid = ss2.oid
+where tname = 'tenk1' and attnum = 1;
+ tname | attname
+-------+---------
+ tenk1 | unique1
+(1 row)
+
 --
 -- Tests for CTE inlining behavior
 --
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 75a9b718b2..af6e157aca 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -890,6 +890,24 @@ fetch backward all in c1;

 commit;

+--
+-- Verify that we correctly flatten cases involving a subquery output
+-- expression that doesn't need to be wrapped in a PlaceHolderVar
+--
+
+explain (costs off)
+select tname, attname from (
+select relname::information_schema.sql_identifier as tname, * from
+  (select * from pg_class c) ss1) ss2
+  right join pg_attribute a on a.attrelid = ss2.oid
+where tname = 'tenk1' and attnum = 1;
+
+select tname, attname from (
+select relname::information_schema.sql_identifier as tname, * from
+  (select * from pg_class c) ss1) ss2
+  right join pg_attribute a on a.attrelid = ss2.oid
+where tname = 'tenk1' and attnum = 1;
+
 --
 -- Tests for CTE inlining behavior
 --
--
2.43.5

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

From

nikhil raj

Date:

18 September 2024, 09:19:30

Hi All,

I hope you're doing well.

I'm writing to kindly requesting if there is a bug tracker ID or any reference number associated with this issue, I would appreciate it if you could share it with me.

Thank you for your time and assistance. Please let me know if there's any additional information you need from me.

Best regards,

Nikhil

On Fri, 30 Aug, 2024, 2:23 am Tom Lane, <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:
> On Thu, Aug 29, 2024 at 4:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In the meantime, I think this test case is mighty artificial,
>> and it wouldn't bother me any to just take it out again for the
>> time being.

> Yeah, I think we can remove the 't1.two+t2.two' test case if we go
> with your proposed patch to address this performance regression.

Here's a polished-up patchset for that. I made the memoize test
removal a separate patch because (a) it only applies to master
and (b) it seems worth calling out as something we might be able
to revert later.

I found one bug in the draft patch: add_nulling_relids only processes
Vars of level zero, so we have to apply it before not after adjusting
the Vars' levelsup. An alternative could be to add a levelsup
parameter to add_nulling_relids, but that seemed like unnecessary
complication.

regards, tom lane