Re: An inefficient query caused by unnecessary PlaceHolderVar - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: An inefficient query caused by unnecessary PlaceHolderVar |
Date | |
Msg-id | CAAaqYe9S1KxiYpX+RmbMMotSSwSZcHpuAnv9Hq+DZ8rkDacNtw@mail.gmail.com Whole thread Raw |
In response to | An inefficient query caused by unnecessary PlaceHolderVar (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: An inefficient query caused by unnecessary PlaceHolderVar
|
List | pgsql-hackers |
On Fri, May 12, 2023 at 2:35 AM Richard Guo <guofenglinux@gmail.com> wrote: > > I happened to notice that the query below can be inefficient. > > # explain (costs off) > select * from > int8_tbl a left join > (int8_tbl b inner join > lateral (select *, b.q2 as x from int8_tbl c) ss on b.q2 = ss.q1) > on a.q1 = b.q1; > QUERY PLAN > ------------------------------------ > Hash Right Join > Hash Cond: (b.q1 = a.q1) > -> Nested Loop > -> Seq Scan on int8_tbl b > -> Seq Scan on int8_tbl c > Filter: (b.q2 = q1) > -> Hash > -> Seq Scan on int8_tbl a > (8 rows) > > For B/C join, currently we only have one option, i.e., nestloop with > parameterized inner path. This could be extremely inefficient in some > cases, such as when C does not have any indexes, or when B is very > large. I believe the B/C join can actually be performed with hashjoin > or mergejoin here, as it is an inner join. > > This happens because when we pull up the lateral subquery, we notice > that Var 'x' is a lateral reference to 'b.q2' which is outside the > subquery. So we wrap it in a PlaceHolderVar. This is necessary for > correctness if it is a lateral reference from nullable side to > non-nullable item. But here in this case, the referenced item is also > nullable, so actually we can omit the PlaceHolderVar with no harm. The > comment in pullup_replace_vars_callback() also explains this point. > > * (Even then, we could omit the PlaceHolderVar if > * the referenced rel is under the same lowest outer join, but > * it doesn't seem worth the trouble to check that.) It's nice that someone already thought about this and left us this comment :) > All such PHVs would imply lateral dependencies which would make us have > no choice but nestloop. I think we should avoid such PHVs as much as > possible. So IMO it may 'worth the trouble to check that'. > > Attached is a patch to check that for simple Vars. Maybe we can extend > it to avoid PHVs for more complex expressions, but that requires some > codes because for now we always wrap non-var expressions to PHVs in > order to have a place to insert nulling bitmap. As a first step, let's > do it for simple Vars only. > > Any thoughts? This looks good to me. A few small tweaks suggested to comment wording: +-- lateral reference for simple Var can escape PlaceHolderVar if the +-- referenced rel is under the same lowest nulling outer join +explain (verbose, costs off) I think this is clearer: "lateral references to simple Vars do not need a PlaceHolderVar when the referenced rel is part of the same lowest nulling outer join"? * lateral references to something outside the subquery being - * pulled up. (Even then, we could omit the PlaceHolderVar if - * the referenced rel is under the same lowest outer join, but - * it doesn't seem worth the trouble to check that.) + * pulled up. Even then, we could omit the PlaceHolderVar if + * the referenced rel is under the same lowest nulling outer + * join. I think this is clearer: "references something outside the subquery being pulled up and is not under the same lowest outer join." One other thing: it would be helpful to have the test query output be stable between HEAD and this patch; perhaps add: order by 1, 2, 3, 4, 5, 6, 7 to ensure stability? Thanks, James Coleman
pgsql-hackers by date: