Re: Add semi-join pushdown to postgres_fdw - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Add semi-join pushdown to postgres_fdw
Date
Msg-id CAPpHfdtLSp3YnmbTMpUZvBa=kJ3qLosUQVjNfW3__=oWUP8dkg@mail.gmail.com
Whole thread Raw
In response to Re: Add semi-join pushdown to postgres_fdw  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
Responses Re: Add semi-join pushdown to postgres_fdw
List pgsql-hackers
Hi, Alexander!

Thank you for working on this.  I believe this is a very interesting patch, which significantly improves our FDW-based distributed facilities.  This is why I decided to review this.

On Fri, Jan 20, 2023 at 11:00 AM Alexander Pyhalov <a.pyhalov@postgrespro.ru> wrote:
> Tomas Vondra писал 2023-01-19 20:49:
> > I took a quick look at the patch. It needs a rebase, although it
> > applies
> > fine using patch.
> >
> > A couple minor comments:
> >
> > 1) addl_conds seems a bit hard to understand, I'd use either the full
> > wording (additional_conds) or maybe extra_conds
>
> Renamed to additional_conds.
>
> >
> > 2) some of the lines got quite long, and need a wrap
> Splitted some of them. Not sure if it's enough.
>
> >
> > 3) unknown_subquery_rels name is a bit misleading - AFAIK it's the rels
> > that can't be referenced from upper rels (per what the .h says). So
> > they
> > are known, but hidden. Is there a better name?
>
> Renamed to hidden_subquery_rels. These are rels, which can't be referred
> to from upper join levels.
>
> >
> > 4) joinrel_target_ok() needs a better comment, explaining *when* the
> > reltarget is safe for pushdown. The conditions are on the same row, but
> > the project style is to break after '&&'.
>
> Added comment. It seems to be a rephrasing of lower comment in
> joinrel_target_ok().

+   /*
+    * We can't push down join if its reltarget is not safe
+    */
+   if (!joinrel_target_ok(root, joinrel, jointype, outerrel, innerrel))
        return false;

As I get joinrel_target_ok() function do meaningful checks only for semi join and always return false for all other kinds of joins.  I think we should call this only for semi join and name the function accordingly.

+   fpinfo->unknown_subquery_rels = bms_union(fpinfo_o->unknown_subquery_rels,
+                                           fpinfo_i->unknown_subquery_rels);

Should the comment before this code block be revised?

+       case JOIN_SEMI:
+           fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
+                                             fpinfo_i->remote_conds);
+           fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
+                                              fpinfo->remote_conds);
+           fpinfo->remote_conds = list_copy(fpinfo_o->remote_conds);
+           fpinfo->unknown_subquery_rels = bms_union(fpinfo->unknown_subquery_rels,
+                                           innerrel->relids);
+           break;

I think that comment before switch() should be definitely revised.

+ Relids hidden_subquery_rels; /* relids, which can't be referred to
+ * from upper relations */

Could this definition contain the positive part?  Can't be referred to from upper relations, but used internally for semi joins (or something like that)?

Also, I think the machinery around the append_conds could be somewhat simpler if we turn them into a list (list of strings).  I think that should make code clearer and also save us some memory allocations.

In [1] you've referenced the cases, when your patch can't push down semi-joins.  It doesn't seem impossible to handle these cases, but that would make the patch much more complicated.  I'm OK to continue with a simpler patch to handle the majority of cases.  Could you please add the cases, which can't be pushed down with the current patch, to the test suite?

Links

------
Regards,
Alexander Korotkov

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: Nathan Bossart
Date:
Subject: always use runtime checks for CRC-32C instructions