Re: BUG #15613: Bug in PG Planner for Foreign Data Wrappers - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15613: Bug in PG Planner for Foreign Data Wrappers
Date
Msg-id 11975.1549480555@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #15613: Bug in PG Planner for Foreign Data Wrappers  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: BUG #15613: Bug in PG Planner for Foreign Data Wrappers
List pgsql-bugs
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
>>> We could either split the function into two or three functions, or add
>>> still more overhead to it to notice what kind of relation has been
>>> passed and adjust its behavior for that. I'm not really thrilled with
>>> the latter: the fact that it's called create_foreignSCAN_path means,
>>> to me, that it's not supposed to be used for anything but baserel
>>> cases.

>> I don't have any strong opinion on that.

> On second thoughts, I think it would be a good idea to split that 
> function, because we can minimize the parameters list passed to each 
> function, making it easy to call that function; as you mentioned, 
> 'required_outer' isn't required for upperrels, and 'fdw_outerpath' isn't 
> required for baserels and upperrels.  Not sure we should do that for PG12.

Yeah, I agree.  Attached is a draft of that.  I've not thought very hard
about how we would want to manage parameterization of foreign joins, so
for now create_foreign_join_path doesn't support that.  We'll have to
change its API whenever we do want to support that, which is a good reason
why it should be separate from create_foreignscan_path: that way we don't
break API for FDWs that are only implementing plain scans.

I propose that we apply this or something much like it to HEAD, and leave
the problem of actually working out parameterized foreign joins for later.

Not sure whether we should do the same in the back branches.  It might be
fine to decide that we're never going to support parameterized foreign
joins in the back branches, in which case I think just adding the Assert
to create_foreignscan_path would be enough there.

            regards, tom lane

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index be626be..391cd0d 100644
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
*************** fileGetForeignPaths(PlannerInfo *root,
*** 556,561 ****
--- 556,565 ----
       * Create a ForeignPath node and add it as only possible path.  We use the
       * fdw_private list of the path to carry the convert_selectively option;
       * it will be propagated into the fdw_private list of the Plan node.
+      *
+      * We don't support pushing join clauses into the quals of this path, but
+      * it could still have required parameterization due to LATERAL refs in
+      * its tlist.
       */
      add_path(baserel, (Path *)
               create_foreignscan_path(root, baserel,
*************** fileGetForeignPaths(PlannerInfo *root,
*** 564,570 ****
                                       startup_cost,
                                       total_cost,
                                       NIL,    /* no pathkeys */
!                                      NULL,    /* no outer rel either */
                                       NULL,    /* no extra plan */
                                       coptions));

--- 568,574 ----
                                       startup_cost,
                                       total_cost,
                                       NIL,    /* no pathkeys */
!                                      baserel->lateral_relids,
                                       NULL,    /* no extra plan */
                                       coptions));

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 7fcac81..994cec5 100644
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
*************** postgresGetForeignPaths(PlannerInfo *roo
*** 937,942 ****
--- 937,945 ----
       * baserestrict conditions we were able to send to remote, there might
       * actually be an indexscan happening there).  We already did all the work
       * to estimate cost and size of this path.
+      *
+      * Although this path uses no join clauses, it could still have required
+      * parameterization due to LATERAL refs in its tlist.
       */
      path = create_foreignscan_path(root, baserel,
                                     NULL,    /* default pathtarget */
*************** postgresGetForeignPaths(PlannerInfo *roo
*** 944,950 ****
                                     fpinfo->startup_cost,
                                     fpinfo->total_cost,
                                     NIL, /* no pathkeys */
!                                    NULL,    /* no outer rel either */
                                     NULL,    /* no extra plan */
                                     NIL);    /* no fdw_private list */
      add_path(baserel, (Path *) path);
--- 947,953 ----
                                     fpinfo->startup_cost,
                                     fpinfo->total_cost,
                                     NIL, /* no pathkeys */
!                                    baserel->lateral_relids,
                                     NULL,    /* no extra plan */
                                     NIL);    /* no fdw_private list */
      add_path(baserel, (Path *) path);
*************** add_paths_with_pathkeys_for_rel(PlannerI
*** 4943,4958 ****
                                   useful_pathkeys,
                                   -1.0);

!         add_path(rel, (Path *)
!                  create_foreignscan_path(root, rel,
!                                          NULL,
!                                          rows,
!                                          startup_cost,
!                                          total_cost,
!                                          useful_pathkeys,
!                                          NULL,
!                                          sorted_epq_path,
!                                          NIL));
      }
  }

--- 4947,4974 ----
                                   useful_pathkeys,
                                   -1.0);

!         if (IS_SIMPLE_REL(rel))
!             add_path(rel, (Path *)
!                      create_foreignscan_path(root, rel,
!                                              NULL,
!                                              rows,
!                                              startup_cost,
!                                              total_cost,
!                                              useful_pathkeys,
!                                              rel->lateral_relids,
!                                              sorted_epq_path,
!                                              NIL));
!         else
!             add_path(rel, (Path *)
!                      create_foreign_join_path(root, rel,
!                                               NULL,
!                                               rows,
!                                               startup_cost,
!                                               total_cost,
!                                               useful_pathkeys,
!                                               rel->lateral_relids,
!                                               sorted_epq_path,
!                                               NIL));
      }
  }

*************** postgresGetForeignJoinPaths(PlannerInfo
*** 5088,5093 ****
--- 5104,5116 ----
          return;

      /*
+      * This code does not work for joins with lateral references, since those
+      * must have parameterized paths, which we don't generate yet.
+      */
+     if (!bms_is_empty(joinrel->lateral_relids))
+         return;
+
+     /*
       * Create unfinished PgFdwRelationInfo entry which is used to indicate
       * that the join relation is already considered, so that we won't waste
       * time in judging safety of join pushdown and adding the same paths again
*************** postgresGetForeignJoinPaths(PlannerInfo
*** 5171,5186 ****
       * Create a new join path and add it to the joinrel which represents a
       * join between foreign tables.
       */
!     joinpath = create_foreignscan_path(root,
!                                        joinrel,
!                                        NULL,    /* default pathtarget */
!                                        rows,
!                                        startup_cost,
!                                        total_cost,
!                                        NIL, /* no pathkeys */
!                                        NULL,    /* no required_outer */
!                                        epq_path,
!                                        NIL);    /* no fdw_private */

      /* Add generated path into joinrel by add_path(). */
      add_path(joinrel, (Path *) joinpath);
--- 5194,5209 ----
       * Create a new join path and add it to the joinrel which represents a
       * join between foreign tables.
       */
!     joinpath = create_foreign_join_path(root,
!                                         joinrel,
!                                         NULL,    /* default pathtarget */
!                                         rows,
!                                         startup_cost,
!                                         total_cost,
!                                         NIL,    /* no pathkeys */
!                                         joinrel->lateral_relids,
!                                         epq_path,
!                                         NIL);    /* no fdw_private */

      /* Add generated path into joinrel by add_path(). */
      add_path(joinrel, (Path *) joinpath);
*************** add_foreign_grouping_paths(PlannerInfo *
*** 5515,5530 ****
      fpinfo->total_cost = total_cost;

      /* Create and add foreign path to the grouping relation. */
!     grouppath = create_foreignscan_path(root,
!                                         grouped_rel,
!                                         grouped_rel->reltarget,
!                                         rows,
!                                         startup_cost,
!                                         total_cost,
!                                         NIL,    /* no pathkeys */
!                                         NULL,    /* no required_outer */
!                                         NULL,
!                                         NIL);    /* no fdw_private */

      /* Add generated path into grouped_rel by add_path(). */
      add_path(grouped_rel, (Path *) grouppath);
--- 5538,5552 ----
      fpinfo->total_cost = total_cost;

      /* Create and add foreign path to the grouping relation. */
!     grouppath = create_foreign_upper_path(root,
!                                           grouped_rel,
!                                           grouped_rel->reltarget,
!                                           rows,
!                                           startup_cost,
!                                           total_cost,
!                                           NIL,    /* no pathkeys */
!                                           NULL,
!                                           NIL); /* no fdw_private */

      /* Add generated path into grouped_rel by add_path(). */
      add_path(grouped_rel, (Path *) grouppath);
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 452b776..77038f5 100644
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
*************** GetForeignJoinPaths(PlannerInfo *root,
*** 309,315 ****
       function is called during query planning.  As
       with <function>GetForeignPaths</function>, this function should
       generate <structname>ForeignPath</structname> path(s) for the
!      supplied <literal>joinrel</literal>, and call <function>add_path</function> to add these
       paths to the set of paths considered for the join.  But unlike
       <function>GetForeignPaths</function>, it is not necessary that this function
       succeed in creating at least one path, since paths involving local
--- 309,317 ----
       function is called during query planning.  As
       with <function>GetForeignPaths</function>, this function should
       generate <structname>ForeignPath</structname> path(s) for the
!      supplied <literal>joinrel</literal>
!      (use <function>create_foreign_join_path</function> to build them),
!      and call <function>add_path</function> to add these
       paths to the set of paths considered for the join.  But unlike
       <function>GetForeignPaths</function>, it is not necessary that this function
       succeed in creating at least one path, since paths involving local
*************** GetForeignUpperPaths(PlannerInfo *root,
*** 369,375 ****
       called only if all base relation(s) involved in the query belong to the
       same FDW.  This function should generate <structname>ForeignPath</structname>
       path(s) for any post-scan/join processing that the FDW knows how to
!      perform remotely, and call <function>add_path</function> to add these paths to
       the indicated upper relation.  As with <function>GetForeignJoinPaths</function>,
       it is not necessary that this function succeed in creating any paths,
       since paths involving local processing are always possible.
--- 371,379 ----
       called only if all base relation(s) involved in the query belong to the
       same FDW.  This function should generate <structname>ForeignPath</structname>
       path(s) for any post-scan/join processing that the FDW knows how to
!      perform remotely
!      (use <function>create_foreign_upper_path</function> to build them),
!      and call <function>add_path</function> to add these paths to
       the indicated upper relation.  As with <function>GetForeignJoinPaths</function>,
       it is not necessary that this function succeed in creating any paths,
       since paths involving local processing are always possible.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b57de6b..9a53e04 100644
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*************** create_worktablescan_path(PlannerInfo *r
*** 2079,2093 ****

  /*
   * create_foreignscan_path
!  *      Creates a path corresponding to a scan of a foreign table, foreign join,
!  *      or foreign upper-relation processing, returning the pathnode.
   *
   * This function is never called from core Postgres; rather, it's expected
!  * to be called by the GetForeignPaths, GetForeignJoinPaths, or
!  * GetForeignUpperPaths function of a foreign data wrapper.  We make the FDW
!  * supply all fields of the path, since we do not have any way to calculate
!  * them in core.  However, there is a usually-sane default for the pathtarget
!  * (rel->reltarget), so we let a NULL for "target" select that.
   */
  ForeignPath *
  create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
--- 2079,2092 ----

  /*
   * create_foreignscan_path
!  *      Creates a path corresponding to a scan of a foreign base table,
!  *      returning the pathnode.
   *
   * This function is never called from core Postgres; rather, it's expected
!  * to be called by the GetForeignPaths function of a foreign data wrapper.
!  * We make the FDW supply all fields of the path, since we do not have any way
!  * to calculate them in core.  However, there is a usually-sane default for
!  * the pathtarget (rel->reltarget), so we let a NULL for "target" select that.
   */
  ForeignPath *
  create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
*************** create_foreignscan_path(PlannerInfo *roo
*** 2100,2105 ****
--- 2099,2125 ----
  {
      ForeignPath *pathnode = makeNode(ForeignPath);

+     /*
+      * Since the path's required_outer should always include all the rel's
+      * lateral_relids, forcibly add those if necessary.  This is a bit of a
+      * hack, but up till early 2019 the contrib FDWs failed to ensure that,
+      * and it's likely that the same error has propagated into many external
+      * FDWs.  Don't risk modifying the passed-in relid set here.
+      */
+     if (rel->lateral_relids && !bms_is_subset(rel->lateral_relids,
+                                               required_outer))
+         required_outer = bms_union(required_outer, rel->lateral_relids);
+
+     /*
+      * Although this function is only meant to be used for scans of baserels,
+      * up till early 2019 postgres_fdw abused it to make paths for join and
+      * upper rels, and the same error may have propagated elsewhere.  It will
+      * work for such cases as long as required_outer is empty (otherwise
+      * get_baserel_parampathinfo does the wrong thing), so this assertion lets
+      * that past.
+      */
+     Assert(bms_is_empty(required_outer) || IS_SIMPLE_REL(rel));
+
      pathnode->path.pathtype = T_ForeignScan;
      pathnode->path.parent = rel;
      pathnode->path.pathtarget = target ? target : rel->reltarget;
*************** create_foreignscan_path(PlannerInfo *roo
*** 2120,2125 ****
--- 2140,2251 ----
  }

  /*
+  * create_foreign_join_path
+  *      Creates a path corresponding to a scan of a foreign join,
+  *      returning the pathnode.
+  *
+  * This function is never called from core Postgres; rather, it's expected
+  * to be called by the GetForeignJoinPaths function of a foreign data wrapper.
+  * We make the FDW supply all fields of the path, since we do not have any way
+  * to calculate them in core.  However, there is a usually-sane default for
+  * the pathtarget (rel->reltarget), so we let a NULL for "target" select that.
+  */
+ ForeignPath *
+ create_foreign_join_path(PlannerInfo *root, RelOptInfo *rel,
+                          PathTarget *target,
+                          double rows, Cost startup_cost, Cost total_cost,
+                          List *pathkeys,
+                          Relids required_outer,
+                          Path *fdw_outerpath,
+                          List *fdw_private)
+ {
+     ForeignPath *pathnode = makeNode(ForeignPath);
+
+     /*
+      * Since the path's required_outer should always include all the rel's
+      * lateral_relids, forcibly add those if necessary.  This is a bit of a
+      * hack, but up till early 2019 the contrib FDWs failed to ensure that,
+      * and it's likely that the same error has propagated into many external
+      * FDWs.  Don't risk modifying the passed-in relid set here.
+      */
+     if (rel->lateral_relids && !bms_is_subset(rel->lateral_relids,
+                                               required_outer))
+         required_outer = bms_union(required_outer, rel->lateral_relids);
+
+     /*
+      * We should use get_joinrel_parampathinfo to handle parameterized paths,
+      * but the API of this function doesn't support it, and existing
+      * extensions aren't trying to build such paths yet anyway.  For the
+      * moment just throw an error if someone tries it; eventually we should
+      * revisit this.
+      */
+     if (!bms_is_empty(required_outer))
+         elog(ERROR, "parameterized foreign joins are not supported yet");
+
+     pathnode->path.pathtype = T_ForeignScan;
+     pathnode->path.parent = rel;
+     pathnode->path.pathtarget = target ? target : rel->reltarget;
+     pathnode->path.param_info = NULL;    /* XXX see above */
+     pathnode->path.parallel_aware = false;
+     pathnode->path.parallel_safe = rel->consider_parallel;
+     pathnode->path.parallel_workers = 0;
+     pathnode->path.rows = rows;
+     pathnode->path.startup_cost = startup_cost;
+     pathnode->path.total_cost = total_cost;
+     pathnode->path.pathkeys = pathkeys;
+
+     pathnode->fdw_outerpath = fdw_outerpath;
+     pathnode->fdw_private = fdw_private;
+
+     return pathnode;
+ }
+
+ /*
+  * create_foreign_upper_path
+  *      Creates a path corresponding to an upper relation that's computed
+  *      directly by an FDW, returning the pathnode.
+  *
+  * This function is never called from core Postgres; rather, it's expected to
+  * be called by the GetForeignUpperPaths function of a foreign data wrapper.
+  * We make the FDW supply all fields of the path, since we do not have any way
+  * to calculate them in core.  However, there is a usually-sane default for
+  * the pathtarget (rel->reltarget), so we let a NULL for "target" select that.
+  */
+ ForeignPath *
+ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel,
+                           PathTarget *target,
+                           double rows, Cost startup_cost, Cost total_cost,
+                           List *pathkeys,
+                           Path *fdw_outerpath,
+                           List *fdw_private)
+ {
+     ForeignPath *pathnode = makeNode(ForeignPath);
+
+     /*
+      * Upper relations should never have any lateral references, since joining
+      * is complete.
+      */
+     Assert(bms_is_empty(rel->lateral_relids));
+
+     pathnode->path.pathtype = T_ForeignScan;
+     pathnode->path.parent = rel;
+     pathnode->path.pathtarget = target ? target : rel->reltarget;
+     pathnode->path.param_info = NULL;
+     pathnode->path.parallel_aware = false;
+     pathnode->path.parallel_safe = rel->consider_parallel;
+     pathnode->path.parallel_workers = 0;
+     pathnode->path.rows = rows;
+     pathnode->path.startup_cost = startup_cost;
+     pathnode->path.total_cost = total_cost;
+     pathnode->path.pathkeys = pathkeys;
+
+     pathnode->fdw_outerpath = fdw_outerpath;
+     pathnode->fdw_private = fdw_private;
+
+     return pathnode;
+ }
+
+ /*
   * calc_nestloop_required_outer
   *      Compute the required_outer set for a nestloop join path
   *
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f04c6b7..4130514 100644
*** a/src/backend/optimizer/util/relnode.c
--- b/src/backend/optimizer/util/relnode.c
*************** get_baserel_parampathinfo(PlannerInfo *r
*** 1225,1230 ****
--- 1225,1233 ----
      double        rows;
      ListCell   *lc;

+     /* If rel has LATERAL refs, every path for it should account for them */
+     Assert(bms_is_subset(baserel->lateral_relids, required_outer));
+
      /* Unparameterized paths have no ParamPathInfo */
      if (bms_is_empty(required_outer))
          return NULL;
*************** get_joinrel_parampathinfo(PlannerInfo *r
*** 1320,1325 ****
--- 1323,1331 ----
      double        rows;
      ListCell   *lc;

+     /* If rel has LATERAL refs, every path for it should account for them */
+     Assert(bms_is_subset(joinrel->lateral_relids, required_outer));
+
      /* Unparameterized paths have no ParamPathInfo or extra join clauses */
      if (bms_is_empty(required_outer))
          return NULL;
*************** get_appendrel_parampathinfo(RelOptInfo *
*** 1511,1516 ****
--- 1517,1525 ----
  {
      ParamPathInfo *ppi;

+     /* If rel has LATERAL refs, every path for it should account for them */
+     Assert(bms_is_subset(appendrel->lateral_relids, required_outer));
+
      /* Unparameterized paths have no ParamPathInfo */
      if (bms_is_empty(required_outer))
          return NULL;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d0c8f99..ef2c9b4 100644
*** a/src/include/optimizer/pathnode.h
--- b/src/include/optimizer/pathnode.h
*************** extern ForeignPath *create_foreignscan_p
*** 118,123 ****
--- 118,136 ----
                          Relids required_outer,
                          Path *fdw_outerpath,
                          List *fdw_private);
+ extern ForeignPath *create_foreign_join_path(PlannerInfo *root, RelOptInfo *rel,
+                          PathTarget *target,
+                          double rows, Cost startup_cost, Cost total_cost,
+                          List *pathkeys,
+                          Relids required_outer,
+                          Path *fdw_outerpath,
+                          List *fdw_private);
+ extern ForeignPath *create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel,
+                           PathTarget *target,
+                           double rows, Cost startup_cost, Cost total_cost,
+                           List *pathkeys,
+                           Path *fdw_outerpath,
+                           List *fdw_private);

  extern Relids calc_nestloop_required_outer(Relids outerrelids,
                               Relids outer_paramrels,

pgsql-bugs by date:

Previous
From: "shmv"
Date:
Subject: Query issues on Foreign tables
Next
From: Peter Geoghegan
Date:
Subject: Re: BUG #15597: possible bug in amcheck/amcheck_next (or corrupted index?)