Thread: An inefficient query caused by unnecessary PlaceHolderVar

An inefficient query caused by unnecessary PlaceHolderVar

From
Richard Guo
Date:
I happened to notice that the query below can be inefficient.

# explain (costs off)
select * from
  int8_tbl a left join
  (int8_tbl b inner join
   lateral (select *, b.q2 as x from int8_tbl c) ss on b.q2 = ss.q1)
  on a.q1 = b.q1;
             QUERY PLAN
------------------------------------
 Hash Right Join
   Hash Cond: (b.q1 = a.q1)
   ->  Nested Loop
         ->  Seq Scan on int8_tbl b
         ->  Seq Scan on int8_tbl c
               Filter: (b.q2 = q1)
   ->  Hash
         ->  Seq Scan on int8_tbl a
(8 rows)

For B/C join, currently we only have one option, i.e., nestloop with
parameterized inner path.  This could be extremely inefficient in some
cases, such as when C does not have any indexes, or when B is very
large.  I believe the B/C join can actually be performed with hashjoin
or mergejoin here, as it is an inner join.

This happens because when we pull up the lateral subquery, we notice
that Var 'x' is a lateral reference to 'b.q2' which is outside the
subquery.  So we wrap it in a PlaceHolderVar.  This is necessary for
correctness if it is a lateral reference from nullable side to
non-nullable item.  But here in this case, the referenced item is also
nullable, so actually we can omit the PlaceHolderVar with no harm.  The
comment in pullup_replace_vars_callback() also explains this point.

   * (Even then, we could omit the PlaceHolderVar if
   * the referenced rel is under the same lowest outer join, but
   * it doesn't seem worth the trouble to check that.)

All such PHVs would imply lateral dependencies which would make us have
no choice but nestloop.  I think we should avoid such PHVs as much as
possible.  So IMO it may 'worth the trouble to check that'.

Attached is a patch to check that for simple Vars.  Maybe we can extend
it to avoid PHVs for more complex expressions, but that requires some
codes because for now we always wrap non-var expressions to PHVs in
order to have a place to insert nulling bitmap.  As a first step, let's
do it for simple Vars only.

Any thoughts?

Thanks
Richard
Attachment

Re: An inefficient query caused by unnecessary PlaceHolderVar

From
James Coleman
Date:
On Fri, May 12, 2023 at 2:35 AM Richard Guo <guofenglinux@gmail.com> wrote:
>
> I happened to notice that the query below can be inefficient.
>
> # explain (costs off)
> select * from
>   int8_tbl a left join
>   (int8_tbl b inner join
>    lateral (select *, b.q2 as x from int8_tbl c) ss on b.q2 = ss.q1)
>   on a.q1 = b.q1;
>              QUERY PLAN
> ------------------------------------
>  Hash Right Join
>    Hash Cond: (b.q1 = a.q1)
>    ->  Nested Loop
>          ->  Seq Scan on int8_tbl b
>          ->  Seq Scan on int8_tbl c
>                Filter: (b.q2 = q1)
>    ->  Hash
>          ->  Seq Scan on int8_tbl a
> (8 rows)
>
> For B/C join, currently we only have one option, i.e., nestloop with
> parameterized inner path.  This could be extremely inefficient in some
> cases, such as when C does not have any indexes, or when B is very
> large.  I believe the B/C join can actually be performed with hashjoin
> or mergejoin here, as it is an inner join.
>
> This happens because when we pull up the lateral subquery, we notice
> that Var 'x' is a lateral reference to 'b.q2' which is outside the
> subquery.  So we wrap it in a PlaceHolderVar.  This is necessary for
> correctness if it is a lateral reference from nullable side to
> non-nullable item.  But here in this case, the referenced item is also
> nullable, so actually we can omit the PlaceHolderVar with no harm.  The
> comment in pullup_replace_vars_callback() also explains this point.
>
>    * (Even then, we could omit the PlaceHolderVar if
>    * the referenced rel is under the same lowest outer join, but
>    * it doesn't seem worth the trouble to check that.)

It's nice that someone already thought about this and left us this comment :)

> All such PHVs would imply lateral dependencies which would make us have
> no choice but nestloop.  I think we should avoid such PHVs as much as
> possible.  So IMO it may 'worth the trouble to check that'.
>
> Attached is a patch to check that for simple Vars.  Maybe we can extend
> it to avoid PHVs for more complex expressions, but that requires some
> codes because for now we always wrap non-var expressions to PHVs in
> order to have a place to insert nulling bitmap.  As a first step, let's
> do it for simple Vars only.
>
> Any thoughts?

This looks good to me.

A few small tweaks suggested to comment wording:

+-- lateral reference for simple Var can escape PlaceHolderVar if the
+-- referenced rel is under the same lowest nulling outer join
+explain (verbose, costs off)

I think this is clearer: "lateral references to simple Vars do not
need a PlaceHolderVar when the referenced rel is part of the same
lowest nulling outer join"?

                  * lateral references to something outside the subquery being
-                 * pulled up.  (Even then, we could omit the PlaceHolderVar if
-                 * the referenced rel is under the same lowest outer join, but
-                 * it doesn't seem worth the trouble to check that.)
+                 * pulled up.  Even then, we could omit the PlaceHolderVar if
+                 * the referenced rel is under the same lowest nulling outer
+                 * join.

I think this is clearer: "references something outside the subquery
being pulled up and is not under the same lowest outer join."

One other thing: it would be helpful to have the test query output be
stable between HEAD and this patch; perhaps add:

order by 1, 2, 3, 4, 5, 6, 7

to ensure stability?

Thanks,
James Coleman



Re: An inefficient query caused by unnecessary PlaceHolderVar

From
Richard Guo
Date:

On Wed, May 31, 2023 at 1:27 AM James Coleman <jtc331@gmail.com> wrote:
This looks good to me.

Thanks for the review!
 
A few small tweaks suggested to comment wording:

+-- lateral reference for simple Var can escape PlaceHolderVar if the
+-- referenced rel is under the same lowest nulling outer join
+explain (verbose, costs off)

I think this is clearer: "lateral references to simple Vars do not
need a PlaceHolderVar when the referenced rel is part of the same
lowest nulling outer join"?

Thanks for the suggestion!  How about we go with "lateral references to
simple Vars do not need a PlaceHolderVar when the referenced rel is
under the same lowest nulling outer join"?  This seems a little more
consistent with the comment in prepjointree.c.
 
                  * lateral references to something outside the subquery being
-                 * pulled up.  (Even then, we could omit the PlaceHolderVar if
-                 * the referenced rel is under the same lowest outer join, but
-                 * it doesn't seem worth the trouble to check that.)
+                 * pulled up.  Even then, we could omit the PlaceHolderVar if
+                 * the referenced rel is under the same lowest nulling outer
+                 * join.

I think this is clearer: "references something outside the subquery
being pulled up and is not under the same lowest outer join."

Agreed.  Will use this one.
 
One other thing: it would be helpful to have the test query output be
stable between HEAD and this patch; perhaps add:

order by 1, 2, 3, 4, 5, 6, 7

to ensure stability?

Thanks for the suggestion!  I wondered about that too but I'm a bit
confused about whether we should add ORDER BY in test case.  I checked
'sql/join.sql' and found that some queries are using ORDER BY but some
are not.  Not sure what the criteria are.

Thanks
Richard

Re: An inefficient query caused by unnecessary PlaceHolderVar

From
James Coleman
Date:
On Wed, May 31, 2023 at 10:30 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> On Wed, May 31, 2023 at 1:27 AM James Coleman <jtc331@gmail.com> wrote:
>>
>> This looks good to me.
>
>
> Thanks for the review!

Sure thing!

>>
>> A few small tweaks suggested to comment wording:
>>
>> +-- lateral reference for simple Var can escape PlaceHolderVar if the
>> +-- referenced rel is under the same lowest nulling outer join
>> +explain (verbose, costs off)
>>
>> I think this is clearer: "lateral references to simple Vars do not
>> need a PlaceHolderVar when the referenced rel is part of the same
>> lowest nulling outer join"?
>
>
> Thanks for the suggestion!  How about we go with "lateral references to
> simple Vars do not need a PlaceHolderVar when the referenced rel is
> under the same lowest nulling outer join"?  This seems a little more
> consistent with the comment in prepjointree.c.

That sounds good to me.

>>
>>                   * lateral references to something outside the subquery being
>> -                 * pulled up.  (Even then, we could omit the PlaceHolderVar if
>> -                 * the referenced rel is under the same lowest outer join, but
>> -                 * it doesn't seem worth the trouble to check that.)
>> +                 * pulled up.  Even then, we could omit the PlaceHolderVar if
>> +                 * the referenced rel is under the same lowest nulling outer
>> +                 * join.
>>
>> I think this is clearer: "references something outside the subquery
>> being pulled up and is not under the same lowest outer join."
>
>
> Agreed.  Will use this one.
>
>>
>> One other thing: it would be helpful to have the test query output be
>> stable between HEAD and this patch; perhaps add:
>>
>> order by 1, 2, 3, 4, 5, 6, 7
>>
>> to ensure stability?
>
>
> Thanks for the suggestion!  I wondered about that too but I'm a bit
> confused about whether we should add ORDER BY in test case.  I checked
> 'sql/join.sql' and found that some queries are using ORDER BY but some
> are not.  Not sure what the criteria are.

I think it's just "is this helpful in this test". Obviously we don't
need it for correctness of this particular check, but as long as the
plan change still occurs as desired (i.e., the ORDER BY doesn't change
the plan from what you're testing) I think it's fine to consider it
author's choice.

James



Re: An inefficient query caused by unnecessary PlaceHolderVar

From
Richard Guo
Date:

On Fri, Jun 2, 2023 at 1:33 AM James Coleman <jtc331@gmail.com> wrote:
On Wed, May 31, 2023 at 10:30 PM Richard Guo <guofenglinux@gmail.com> wrote:
> Thanks for the review!

Sure thing!

I've updated the patch according to the reviews as attached.  But I did
not add ORDER BY clause in the test, as we don't need it for correctness
for this test query and the surrounding queries in join.sql don't have
ORDER BY either.

Thanks
Richard
Attachment

Re: An inefficient query caused by unnecessary PlaceHolderVar

From
Richard Guo
Date:
Updated this patch over 29f114b6ff, which indicates that we should apply
the same rules for PHVs.

Thanks
Richard
Attachment

Re: An inefficient query caused by unnecessary PlaceHolderVar

From
Richard Guo
Date:
On Mon, Jan 15, 2024 at 1:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
> Updated this patch over 29f114b6ff, which indicates that we should apply
> the same rules for PHVs.

Here is a new rebase of this patch, with some tweaks to comments.  I've
also updated the commit message to better explain the context.

To recap, this patch tries to avoid wrapping Vars and PHVs from subquery
output that are lateral references to something outside the subquery.
Typically this kind of wrapping is necessary when the Var/PHV references
the non-nullable side of the outer join from the nullable side, because
we need to ensure that it is evaluated at the right place and hence is
forced to null when the outer join should do so.  But if the referenced
rel is under the same lowest nulling outer join, we can actually omit
the wrapping.  The PHVs generated from such kind of wrapping imply
lateral dependencies and force us to resort to nestloop joins, so we'd
better get rid of them.

This patch performs this check by remembering the relids of the nullable
side of the lowest outer join the subquery is within.  So I think it
improves the overall plan in the related cases with very little extra
cost.

Thanks
Richard

Attachment

Re: An inefficient query caused by unnecessary PlaceHolderVar

From
Dmitry Dolgov
Date:
> On Fri, Jun 21, 2024 at 10:35:30AM GMT, Richard Guo wrote:
> On Mon, Jan 15, 2024 at 1:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
> > Updated this patch over 29f114b6ff, which indicates that we should apply
> > the same rules for PHVs.
>
> Here is a new rebase of this patch, with some tweaks to comments.  I've
> also updated the commit message to better explain the context.
>
> To recap, this patch tries to avoid wrapping Vars and PHVs from subquery
> output that are lateral references to something outside the subquery.
> Typically this kind of wrapping is necessary when the Var/PHV references
> the non-nullable side of the outer join from the nullable side, because
> we need to ensure that it is evaluated at the right place and hence is
> forced to null when the outer join should do so.  But if the referenced
> rel is under the same lowest nulling outer join, we can actually omit
> the wrapping.  The PHVs generated from such kind of wrapping imply
> lateral dependencies and force us to resort to nestloop joins, so we'd
> better get rid of them.
>
> This patch performs this check by remembering the relids of the nullable
> side of the lowest outer join the subquery is within.  So I think it
> improves the overall plan in the related cases with very little extra
> cost.

The patch looks good to me, the implementation is concise and clear. I can't
imagine any visible overhead due to storing lowest_nullable_relids in this
context. The only nitpick I have is about this commentary:

      /*
       * Simple Vars always escape being wrapped, unless they are
       * lateral references to something outside the subquery being
    -  * pulled up.  (Even then, we could omit the PlaceHolderVar if
    -  * the referenced rel is under the same lowest outer join, but
    -  * it doesn't seem worth the trouble to check that.)
    +  * pulled up and the referenced rel is not under the same
    +  * lowest outer join.
       */

It mentions "lowest outer join", as in the original version of the text. Would
it be more precise to mention nullable side of the outer join as well?

I'm going to switch it to RFC.



Re: An inefficient query caused by unnecessary PlaceHolderVar

From
wenhui qiu
Date:
Hi Richard
> BTW, since commit cb8e50a4a, we've chosen not to wrap a non-var
>  expression if it contains Vars/PHVs of the pulled-up subquery and does
> not contain non-strict constructs.  I wonder if we can apply the same
> optimization from this patch to non-var expressions: for a LATERAL
> subquery, if a non-var expression contains Vars/PHVs of the
> lowest_nullable_relids and does not contain non-strict constructs, it
>could also escape being wrapped.  Any thoughts?
agree , The path tested  and looks good to me,In addition, can the test case have a multi-layer nested subquery(maybe) ?



Thanks

On Wed, Nov 27, 2024 at 4:46 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Nov 22, 2024 at 5:08 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> The patch looks good to me, the implementation is concise and clear. I can't
> imagine any visible overhead due to storing lowest_nullable_relids in this
> context. The only nitpick I have is about this commentary:
>
>       /*
>        * Simple Vars always escape being wrapped, unless they are
>        * lateral references to something outside the subquery being
>     -  * pulled up.  (Even then, we could omit the PlaceHolderVar if
>     -  * the referenced rel is under the same lowest outer join, but
>     -  * it doesn't seem worth the trouble to check that.)
>     +  * pulled up and the referenced rel is not under the same
>     +  * lowest outer join.
>        */
>
> It mentions "lowest outer join", as in the original version of the text. Would
> it be more precise to mention nullable side of the outer join as well?

Thank you for your review.

I ended up using 'under the same lowest nulling outer join' to
keep consistent with the wording used elsewhere.  Please see the
updated patch attached.

BTW, since commit cb8e50a4a, we've chosen not to wrap a non-var
expression if it contains Vars/PHVs of the pulled-up subquery and does
not contain non-strict constructs.  I wonder if we can apply the same
optimization from this patch to non-var expressions: for a LATERAL
subquery, if a non-var expression contains Vars/PHVs of the
lowest_nullable_relids and does not contain non-strict constructs, it
could also escape being wrapped.  Any thoughts?

Thanks
Richard

Re: An inefficient query caused by unnecessary PlaceHolderVar

From
Andrei Lepikhov
Date:
On 12/2/24 10:46, Richard Guo wrote:
> On Wed, Nov 27, 2024 at 5:45 PM Richard Guo <guofenglinux@gmail.com> wrote:
>> I ended up using 'under the same lowest nulling outer join' to
>> keep consistent with the wording used elsewhere.  Please see the
>> updated patch attached.
> 
> Commit e032e4c7d computes the nullingrel data for each leaf RTE, and
> we can leverage that to determine if the referenced rel is under the
> same lowest nulling outer join: we just need to check if the
> nullingrels of the subquery RTE are a subset of those of the lateral
> referenced rel.  This eliminates the need to introduce
> lowest_nullable_side.  Please see attached.
Thanks for drawing attention to e032e4c7d. It is a really helpful 
structure. I remember last year, we discussed [1] one sophisticated 
subquery pull-up technique, and we needed exactly the same data - it was 
too invasive to commit, and we committed only a small part of it. The 
nullingrel_info structure may give this feature one more chance.

A couple of words about your patch. These few lines of code caused a lot 
of discoveries, but in my opinion, they look fine. But I didn't find 
negative tests, where we need to wrap a Var with PHV like the following:

explain (verbose, costs off)
select t1.q1, x from
   int8_tbl t1 left join
   (int8_tbl t2 left join
    lateral (select t2.q2 as x, * from int8_tbl t3) ss on t2.q2 = ss.q1)
   on t1.q1 = t2.q1
order by 1, 2;

If regression tests doesn't contain such check it would be nice to add.

[1] 
https://www.postgresql.org/message-id/35c8a3e8-d080-dfa8-2be3-cf5fe702010a%40postgrespro.ru

-- 
regards, Andrei Lepikhov



Re: An inefficient query caused by unnecessary PlaceHolderVar

From
Richard Guo
Date:
On Tue, Dec 3, 2024 at 5:33 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> A couple of words about your patch. These few lines of code caused a lot
> of discoveries, but in my opinion, they look fine. But I didn't find
> negative tests, where we need to wrap a Var with PHV like the following:

Pushed after adding two negative tests.  Thank you for your review.

Thanks
Richard