Re: An inefficient query caused by unnecessary PlaceHolderVar - Mailing list pgsql-hackers

From James Coleman
Subject Re: An inefficient query caused by unnecessary PlaceHolderVar
Date
Msg-id CAAaqYe9S1KxiYpX+RmbMMotSSwSZcHpuAnv9Hq+DZ8rkDacNtw@mail.gmail.com
Whole thread Raw
In response to An inefficient query caused by unnecessary PlaceHolderVar  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: An inefficient query caused by unnecessary PlaceHolderVar
List pgsql-hackers
On Fri, May 12, 2023 at 2:35 AM Richard Guo <guofenglinux@gmail.com> wrote:
>
> I happened to notice that the query below can be inefficient.
>
> # explain (costs off)
> select * from
>   int8_tbl a left join
>   (int8_tbl b inner join
>    lateral (select *, b.q2 as x from int8_tbl c) ss on b.q2 = ss.q1)
>   on a.q1 = b.q1;
>              QUERY PLAN
> ------------------------------------
>  Hash Right Join
>    Hash Cond: (b.q1 = a.q1)
>    ->  Nested Loop
>          ->  Seq Scan on int8_tbl b
>          ->  Seq Scan on int8_tbl c
>                Filter: (b.q2 = q1)
>    ->  Hash
>          ->  Seq Scan on int8_tbl a
> (8 rows)
>
> For B/C join, currently we only have one option, i.e., nestloop with
> parameterized inner path.  This could be extremely inefficient in some
> cases, such as when C does not have any indexes, or when B is very
> large.  I believe the B/C join can actually be performed with hashjoin
> or mergejoin here, as it is an inner join.
>
> This happens because when we pull up the lateral subquery, we notice
> that Var 'x' is a lateral reference to 'b.q2' which is outside the
> subquery.  So we wrap it in a PlaceHolderVar.  This is necessary for
> correctness if it is a lateral reference from nullable side to
> non-nullable item.  But here in this case, the referenced item is also
> nullable, so actually we can omit the PlaceHolderVar with no harm.  The
> comment in pullup_replace_vars_callback() also explains this point.
>
>    * (Even then, we could omit the PlaceHolderVar if
>    * the referenced rel is under the same lowest outer join, but
>    * it doesn't seem worth the trouble to check that.)

It's nice that someone already thought about this and left us this comment :)

> All such PHVs would imply lateral dependencies which would make us have
> no choice but nestloop.  I think we should avoid such PHVs as much as
> possible.  So IMO it may 'worth the trouble to check that'.
>
> Attached is a patch to check that for simple Vars.  Maybe we can extend
> it to avoid PHVs for more complex expressions, but that requires some
> codes because for now we always wrap non-var expressions to PHVs in
> order to have a place to insert nulling bitmap.  As a first step, let's
> do it for simple Vars only.
>
> Any thoughts?

This looks good to me.

A few small tweaks suggested to comment wording:

+-- lateral reference for simple Var can escape PlaceHolderVar if the
+-- referenced rel is under the same lowest nulling outer join
+explain (verbose, costs off)

I think this is clearer: "lateral references to simple Vars do not
need a PlaceHolderVar when the referenced rel is part of the same
lowest nulling outer join"?

                  * lateral references to something outside the subquery being
-                 * pulled up.  (Even then, we could omit the PlaceHolderVar if
-                 * the referenced rel is under the same lowest outer join, but
-                 * it doesn't seem worth the trouble to check that.)
+                 * pulled up.  Even then, we could omit the PlaceHolderVar if
+                 * the referenced rel is under the same lowest nulling outer
+                 * join.

I think this is clearer: "references something outside the subquery
being pulled up and is not under the same lowest outer join."

One other thing: it would be helpful to have the test query output be
stable between HEAD and this patch; perhaps add:

order by 1, 2, 3, 4, 5, 6, 7

to ensure stability?

Thanks,
James Coleman



pgsql-hackers by date:

Previous
From: MARK CALLAGHAN
Date:
Subject: Re: benchmark results comparing versions 15.2 and 16
Next
From: David Christensen
Date:
Subject: Re: [PATCHES] Post-special page storage TDE support