Re: Problems with plan estimates in postgres_fdw - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: Problems with plan estimates in postgres_fdw |
Date | |
Msg-id | 19433.1549962210@localhost Whole thread Raw |
In response to | Re: Problems with plan estimates in postgres_fdw (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: Problems with plan estimates in postgres_fdw
|
List | pgsql-hackers |
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/02/08 2:04), Antonin Houska wrote: > > > First, I wonder if the information on LIMIT needs to be passed to the > > FDW. Cannot the FDW routines use root->tuple_fraction? > > I think it's OK for the FDW to use the tuple_fraction, but I'm not sure the > tuple_fraction should be enough. For example, I don't think we can re-compute > from the tuple_fraction the LIMIT/OFFSET values. > > > I think (although have > > not verified experimentally) that the problem reported in [1] is not limited > > to the LIMIT clause. I think the same problem can happen if client uses cursor > > and sets cursor_tuple_fraction very low. Since the remote node is not aware of > > this setting when running EXPLAIN, the low fraction can also make postgres_fdw > > think that retrieval of the whole set is cheaper than it will appear to be > > during the actual execution. > > I think it would be better to get a fast-start plan on the remote side in such > a situation, but I'm not sure we can do that as well with this patch, because > in the cursor case, the FDW couldn't know in advance exactly how may rows will > be fetched from the remote side using that cursor, so the FDW couldn't push > down LIMIT. So I'd like to leave that for another patch. root->tuple_fraction (possibly derived from cursor_tuple_fraction) should be enough to decide whether fast-startup plan should be considered. I just failed to realize that your patch primarily aims at the support of UPPERREL_ORDERED and UPPERREL_FINAL relations by postgres_fdw. Yes, another patch would be needed to completely resolve the problem reported by Andrew Gierth upthread. > > Some comments on coding: > > > > 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch > > ----------------------------------------------------------------- > > > > * I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that > > grouping_planner() does not call create_limit_path() until it's done with > > create_ordered_paths(), and create_ordered_paths() is where the FDW is > > requested to add its paths to UPPERREL_ORDERED relation. > > Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT at > all. I added the parameter limit_tuples to PgFdwPathExtraData so that we can > pass limit_tuples=-1, which means no LIMIT, to cost_sort(), though. Yes, the limit_tuples field made me confused. But if cost_sort() is the only reason, why not simply pass -1 in the 0001 patch, and use the limit_tuples argument only in 0002? I see several other calls of cost_sort() in the core where -1 is hard-coded. > > Both patches: > > ------------ > > > > One thing that makes me confused when I try to understand details is that the > > new functions add_foreign_ordered_paths() and add_foreign_final_paths() both > > pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size(): > > > > estimate_path_cost_size(root, input_rel, ...) > > > > whereas the existing add_foreign_grouping_paths() passes "grouped_rel": > > > > estimate_path_cost_size(root, grouped_rel, ...) > > > > Can you please make this consistent? If you do, you'll probably need to > > reconsider your changes to estimate_path_cost_size(). > > This is intended and I think I should add more comments on that. Let me > explain. These patches modified estimate_path_cost_size() so that we can > estimate the costs of a foreign path for the UPPERREL_ORDERED or > UPPERREL_FINAL rel, by adding the costs of the upper-relation processing (ie, > ORDER BY and/or LIMIT/OFFSET, specified by PgFdwPathExtraData) to the costs of > implementing the *underlying* relation for the upper relation (ie, scan, join > or grouping rel, specified by the input_rel). So those functions call > estimate_path_cost_size() with the input_rel, not the ordered_rel/final_rel, > along with PgFdwPathExtraData. I think the same already happens for the UPPERREL_GROUP_AGG relation: estimate_path_cost_size() ... else if (IS_UPPER_REL(foreignrel)) { ... ofpinfo = (PgFdwRelationInfo *) fpinfo->outerrel->fdw_private; Here "outerrel" actually means input relation from the grouping perspective. The input relation (i.e. result of scan / join) estimates are retrieved from "ofpinfo" and the costs of aggregation are added. Why should this be different for other kinds of upper relations? > > And maybe related problem: why should FDW care about the effect of > > apply_scanjoin_target_to_paths() like you claim to do here? > > > > /* > > * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the > > * final scan/join relation, the costs obtained from the cache > > * wouldn't yet contain the eval cost for the final scan/join target, > > * which would have been updated by apply_scanjoin_target_to_paths(); > > * add the eval cost now. > > */ > > if (fpextra&& !IS_UPPER_REL(foreignrel)) > > { > > /* The costs should have been obtained from the cache. */ > > Assert(fpinfo->rel_startup_cost>= 0&& > > fpinfo->rel_total_cost>= 0); > > > > startup_cost += foreignrel->reltarget->cost.startup; > > run_cost += foreignrel->reltarget->cost.per_tuple * rows; > > } > > > > I think it should not, whether "foreignrel" means "input_rel" or "output_rel" > > from the perspective of postgresGetForeignUpperPaths(). > > I added this before costing the sort operation below, because 1) the cost of > evaluating the scan/join target might not be zero (consider the case where > sort columns are not simple Vars, for example) and 2) the cost of sorting > takes into account the underlying path's startup/total costs. Maybe I'm > missing something, though. My understanding is that the "input_rel" argument of postgresGetForeignUpperPaths() should already have been processed by postgresGetForeignPaths() or postgresGetForeignJoinPaths() (or actually by postgresGetForeignUpperPaths() called with different "stage" too). Therefore it should already have the "fdw_private" field initialized. The estimates in the corresponding PgFdwRelationInfo should already include the reltarget costs, as set previously by estimate_path_cost_size(). -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
pgsql-hackers by date: