Re: Add semi-join pushdown to postgres_fdw - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Add semi-join pushdown to postgres_fdw
Date
Msg-id CAPpHfdt3m8DdyMfLK_mMcikno3ZA0DWsca+2JD=kVyF1DBNGWA@mail.gmail.com
Whole thread Raw
In response to Re: Add semi-join pushdown to postgres_fdw  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
List pgsql-hackers
On Mon, Mar 24, 2025 at 6:56 PM Alexander Pyhalov
<a.pyhalov@postgrespro.ru> wrote:
> Alexander Korotkov писал(а) 2025-03-24 11:49:
> > On Mon, Mar 24, 2025 at 9:07 AM Alexander Pyhalov
> > <a.pyhalov@postgrespro.ru> wrote:
> >> Alexander Korotkov писал(а) 2025-03-24 04:21:
> >> > Hi, Alexander!
> >> >
> >> > On Tue, Mar 18, 2025 at 6:04 PM Alexander Pyhalov
> >> > <a.pyhalov@postgrespro.ru> wrote:
> >> >> This shouldn't. When semi-join is found below left/right join, it's
> >> >> deparsed as subquery.
> >> >> Interesting enough, this mechanics (deparsing as subquery) is used
> >> >> 1) for semi-joins under left/right join,
> >> >> 2) for full outer joins when inner or outer part has some
> >> >> remote_conds.
> >> >>
> >> >> The issue here is that after subquery is deparsed, we don't consider
> >> >> if
> >> >> its target attributes are available to the upper level
> >> >> join . As for semi-join itself, all conditions are still deparsed on
> >> >> left/right join boundary, they are just not propagated further.
> >> >> This shouldn't be a problem, as they are evaluated in subquery. As for
> >> >> left/right join without semi-join beneath it - its behavior is not
> >> >> affected
> >> >> (as hidden_subquery_rels is empty).
> >> >
> >> > Thank you for the explanation.  But I have another question.  Aren't
> >> > the checks you've proposed too strict?  hidden_subquery_rels are
> >> > propagated all the way to the join tree.  So, pulling conditions would
> >> > be disables all the way to the join tree too.  Is it enough to just
> >> > disable pulling conditions directly from semi-joins, then their
> >> > further pulls will be disabled automatically?  See the attached patch.
> >> > It also contains other (mostly cosmetic improvements).
> >> >
> >> > ------
> >> > Regards,
> >> > Alexander Korotkov
> >> > Supabase
> >>
> >> Hi. No, they are not too strict. Look at the following example
> >>
> >> EXPLAIN (verbose, costs off)
> >> SELECT x1.c1 FROM
> >>                 (SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4
> >> WHERE
> >> ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x1
> >>         RIGHT JOIN
> >>                 (SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4
> >> WHERE
> >> ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x2
> >>         ON (x1.c1 = x2.c1)
> >>         LEFT JOIN
> >>                 (SELECT * FROM ft2 WHERE c2 < 11) x3
> >>         ON (x1.c1 = x3.c1)
> >> ORDER BY x1.c1 LIMIT 10;
> >>
> >> With patch which you suggest, we'll deparse left part of left join as
> >> subquery, but will try to pop c2 < 10 condition from
> >> (8) LEFT JOIN ((6) SEMI JOIN (7)) subquery. When we look at left join
> >> of
> >> this subquery and ft2, we still deparse left part as
> >> subquery, so can't pop up conditions from it.
> >
> > I've checked, this query seems to result in the exactly same remote
> > SQLs with your and mine patches.  Could you elaborate more on the
> > difference?  Do you think foreign_join_ok() can give different results
> > on this query?
>
> Hi.
> With your patch this example gives the same
> ERROR:  unexpected expression in subquery output
>
> This happens, because we don't keep knowledge that we have deparsed all
> semi-joins below this left join. As long as left/right join has
> semi-join in its left or right part, this part will be deparsed as
> subquery (look at the following lines in foreign_join_ok()):
>
>          else if (jointype == JOIN_LEFT || jointype == JOIN_RIGHT ||
> jointype == JOIN_FULL)
>          {
>                  /*
>                   * Conditions, generated from semi-joins, should be
> evaluated before
>                   * LEFT/RIGHT/FULL join.
>                   */
>                  if (!bms_is_empty(fpinfo_o->hidden_subquery_rels))
>                  {
>                          fpinfo->make_outerrel_subquery = true;
>                          fpinfo->lower_subquery_rels =
> bms_add_members(fpinfo->lower_subquery_rels, outerrel->relids);
>                  }
>
>                  if (!bms_is_empty(fpinfo_i->hidden_subquery_rels))
>                  {
>                          fpinfo->make_innerrel_subquery = true;
>                          fpinfo->lower_subquery_rels =
> bms_add_members(fpinfo->lower_subquery_rels, innerrel->relids);
>                  }
>          }
>
>
> So, we still can't refer to its remote_conds from upper level queries
> (as not all Vars are available from subquery after subquery is created
> in one part of left or right join). It's not necessary to have semi-join
> for this as immediate left/right join inner or outer for inner/outer to
> be deparsed as subquery. But it shouldn't be an issue - we've already
> used remote_conds when created this subquery.
> What I'm trying to say - logic of 'making subquery' and extracting
> conditions should match (or we need more sophisticated way of forming
> subquery targetlist, so that extracted conditions could be used above
> subqueries).

Thank you for the explanation.  Pushed.  However, it would be nice in
future to rework this in a way that semi-joins on lower levels of join
tree don't have negative impact on optimization of upper levels of
join tree.

------
Regards,
Alexander Korotkov
Supabase



pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum
Next
From: Alexander Korotkov
Date:
Subject: Re: Remove an unnecessary check on semijoin_target_ok() on postgres_fdw.c