Re: Check lateral references within PHVs for memoize cache keys - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Check lateral references within PHVs for memoize cache keys
Date
Msg-id CAMbWs49+Cjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg@mail.gmail.com
Whole thread Raw
In response to Re: Check lateral references within PHVs for memoize cache keys  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Check lateral references within PHVs for memoize cache keys
List pgsql-hackers

On Sun, Jul 9, 2023 at 1:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> Rebase the patch on HEAD as cfbot reminds.

This fix seems like a mess.  The function that is in charge of filling
RelOptInfo.lateral_vars is extract_lateral_references; or at least
that was how it was done up to now.  Can't we compute these additional
references there?  If not, maybe we ought to just merge
extract_lateral_references into create_lateral_join_info, rather than
having the responsibility split.  I also wonder whether this change
isn't creating hidden dependencies on RTE order (which would likely be
bugs), since create_lateral_join_info itself examines the lateral_vars
lists as it walks the rtable.

Yeah, you're right.  Currently all RelOptInfo.lateral_vars are filled in
extract_lateral_references.  And then in create_lateral_join_info these
lateral_vars, together with all PlaceHolderInfos, are examined to
compute the per-base-relation lateral refs relids.  However, in the
process of extract_lateral_references, it's possible that we create new
PlaceHolderInfos.  So I think it may not be a good idea to extract the
lateral references within PHVs there.  But I agree with you that it's
also not a good idea to compute these additional lateral Vars within
PHVs in create_lateral_join_info as the patch does.  Actually with the
patch I find that with PHVs that are due to be evaluated at a join we
may get a problematic plan.  For instance

explain (costs off)
select * from t t1 left join lateral
(select t1.a as t1a, t2.a as t2a from t t2 join t t3 on true) s on true
where s.t1a = s.t2a;
             QUERY PLAN
------------------------------------
 Nested Loop
   ->  Seq Scan on t t1
   ->  Nested Loop
         Join Filter: (t1.a = t2.a)
         ->  Seq Scan on t t2
         ->  Memoize
               Cache Key: t1.a
               Cache Mode: binary
               ->  Seq Scan on t t3
(9 rows)

There are no lateral refs in the subtree of the Memoize node, so it
should be a Materialize node rather than a Memoize node.  This is caused
by that for a PHV that is due to be evaluated at a join, we fill its
lateral refs in each baserel in the join, which is wrong.

So I'm wondering if it'd be better that we move all this logic of
computing additional lateral references within PHVs to get_memoize_path,
where we can examine only PHVs that are evaluated at innerrel.  And
considering that these lateral refs are only used by Memoize, it seems
more sensible to compute them there.  But I'm a little worried that
doing this would make get_memoize_path too expensive.

Please see v4 patch for this change.

Thanks
Richard
Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Consistent coding for the naming of LR workers
Next
From: フブキダイスキ
Date:
Subject: \di+ cannot show the same name indexes