Re: Problems with plan estimates in postgres_fdw - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Problems with plan estimates in postgres_fdw
Date
Msg-id 10994.1549559088@localhost
Whole thread Raw
In response to Re: Problems with plan estimates in postgres_fdw  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: Problems with plan estimates in postgres_fdw
List pgsql-hackers
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:

> (2018/12/28 15:50), Etsuro Fujita wrote:
> > Attached is a new version of the
> > patch.
>
> Here is an updated version of the patch set.  Changes are:

I've spent some time reviewing the patches.

First, I wonder if the information on LIMIT needs to be passed to the
FDW. Cannot the FDW routines use root->tuple_fraction? I think (although have
not verified experimentally) that the problem reported in [1] is not limited
to the LIMIT clause. I think the same problem can happen if client uses cursor
and sets cursor_tuple_fraction very low. Since the remote node is not aware of
this setting when running EXPLAIN, the low fraction can also make postgres_fdw
think that retrieval of the whole set is cheaper than it will appear to be
during the actual execution.

Some comments on coding:

0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch
-----------------------------------------------------------------

* I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that
grouping_planner() does not call create_limit_path() until it's done with
create_ordered_paths(), and create_ordered_paths() is where the FDW is
requested to add its paths to UPPERREL_ORDERED relation.

* add_foreign_ordered_paths()

Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the target.

I think create_ordered_paths() should actually set the target so that
postgresGetForeignUpperPaths() can simply find it in
output_rel->reltarget. This is how postgres_fdw already retrieves the target
for grouped paths. Small patch is attached that makes this possible.

* regression tests: I think test(s) should be added for queries that have
  ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET)
  clause. I haven't noticed such tests.


0002-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely-v3.patch
---------------------------------------------------------------

* add_foreign_final_paths()

Similar problem I reported for add_foreign_ordered_paths() above.

* adjust_limit_rows_costs()

Doesn't this function address more generic problem than the use of
postgres_fdw? If so, then it should be submitted as a separate patch. BTW, the
function is only used in pathnode.c, so it should be declared static.

Both patches:
------------

One thing that makes me confused when I try to understand details is that the
new functions add_foreign_ordered_paths() and add_foreign_final_paths() both
pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size():

estimate_path_cost_size(root, input_rel, ...)

whereas the existing add_foreign_grouping_paths() passes "grouped_rel":

estimate_path_cost_size(root, grouped_rel, ...)

Can you please make this consistent? If you do, you'll probably need to
reconsider your changes to estimate_path_cost_size().

And maybe related problem: why should FDW care about the effect of
apply_scanjoin_target_to_paths() like you claim to do here?

    /*
     * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the
     * final scan/join relation, the costs obtained from the cache
     * wouldn't yet contain the eval cost for the final scan/join target,
     * which would have been updated by apply_scanjoin_target_to_paths();
     * add the eval cost now.
     */
    if (fpextra && !IS_UPPER_REL(foreignrel))
    {
        /* The costs should have been obtained from the cache. */
        Assert(fpinfo->rel_startup_cost >= 0 &&
               fpinfo->rel_total_cost >= 0);

        startup_cost += foreignrel->reltarget->cost.startup;
        run_cost += foreignrel->reltarget->cost.per_tuple * rows;
    }

I think it should not, whether "foreignrel" means "input_rel" or "output_rel"
from the perspective of postgresGetForeignUpperPaths().


[1] https://www.postgresql.org/message-id/87pnz1aby9.fsf@news-spur.riddles.org.uk

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b2239728cf..17e983f11e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2035,6 +2035,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
          * of the corresponding upperrels might not be needed for this query.
          */
         root->upper_targets[UPPERREL_FINAL] = final_target;
+        root->upper_targets[UPPERREL_ORDERED] = final_target;
         root->upper_targets[UPPERREL_WINDOW] = sort_input_target;
         root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target;
 
@@ -2118,6 +2119,12 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
     final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
 
     /*
+     * Set reltarget so that it's consistent with the paths. Also it's more
+     * convenient for FDW to find the target here.
+     */
+    final_rel->reltarget = final_target;
+
+    /*
      * If the input rel is marked consider_parallel and there's nothing that's
      * not parallel-safe in the LIMIT clause, then the final_rel can be marked
      * consider_parallel as well.  Note that if the query has rowMarks or is
@@ -4865,6 +4872,12 @@ create_ordered_paths(PlannerInfo *root,
     ordered_rel = fetch_upper_rel(root, UPPERREL_ORDERED, NULL);
 
     /*
+     * Set reltarget so that it's consistent with the paths. Also it's more
+     * convenient for FDW to find the target here.
+     */
+    ordered_rel->reltarget = target;
+
+    /*
      * If the input relation is not parallel-safe, then the ordered relation
      * can't be parallel-safe, either.  Otherwise, it's parallel-safe if the
      * target list is parallel-safe.

pgsql-hackers by date:

Previous
From: Nathan Wagner
Date:
Subject: Re: bug tracking system
Next
From: David Fetter
Date:
Subject: Re: Tighten up a few overly lax regexes in pg_dump's tap tests