Thread: Problems with plan estimates in postgres_fdw
This analysis comes from investigating a report from an IRC user. A summary of the initial report is: Using PG 9.6.9 and postgres_fdw, a query of the form "select * from foreign_table order by col limit 1" is getting a local Sort plan, not pushing the ORDER BY to the remote. Turning off use_remote_estimates changes the plan to use a remote sort, with a 10000x speedup. I don't think this can be called a bug, exactly, and I don't have an immediate fix, so I'm putting this analysis up for the benefit of anyone working on this in future. The cause of the misplan seems to be this: postgres_fdw with use_remote_estimates on does not attempt to obtain fast-start plans from the remote. In this case what happens is this: 1. postgres_fdw gets the cost estimate from the plain remote fetch, by doing "EXPLAIN select * from table". This produces a plan with a low startup cost (just the constant overhead) and a high total cost (on the order of 1.2e6 in this case). 2. postgres_fdw gets the cost estimate for the ordered fetch, by doing "EXPLAIN select * from table order by col". Note that there is no LIMIT nor any cursor_tuple_fraction in effect, so the plan returned in this case is a seqscan+sort plan (in spite of the presence of an index on "col"), with a very high (order of 8e6) startup and total cost. So when the local side tries to generate paths, it has the choice of using a remote-ordered path with startup cost 8e6, or a local top-1 sort on top of an unordered remote path, which has a total cost on the order of 1.5e6 in this case; cheaper than the remote sort because this only needs to do top-1, while the remote is sorting millions of rows and would probably spill to disk. However, when it comes to actual execution, postgres_fdw opens a cursor for the remote query, which means that cursor_tuple_fraction will come into play. As far as I can tell, this is not set anywhere, so this means that the plan that actually gets run on the remote is likely to have _completely_ different costs from those returned by the EXPLAINs. In particular, in this case the fast-start index-scan plan for the ORDER BY remote query is clearly being chosen when use_remote_estimates is off (since the query completes in 15ms rather than 150 seconds). One possibility: would it be worth adding an option to EXPLAIN that makes it assume cursor_tuple_fraction? -- Andrew (irc:RhodiumToad)
Hello. At Thu, 02 Aug 2018 01:06:41 +0100, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote in <87pnz1aby9.fsf@news-spur.riddles.org.uk> > This analysis comes from investigating a report from an IRC user. A > summary of the initial report is: > > Using PG 9.6.9 and postgres_fdw, a query of the form "select * from > foreign_table order by col limit 1" is getting a local Sort plan, not > pushing the ORDER BY to the remote. Turning off use_remote_estimates > changes the plan to use a remote sort, with a 10000x speedup. > > I don't think this can be called a bug, exactly, and I don't have an > immediate fix, so I'm putting this analysis up for the benefit of anyone > working on this in future. I didn't see the concrete estimates, it seems that the cause is too-small total cost of non-remote-sorted plan compared with the startup cost of remote-sorted one. In other words, tuple cost by the remoteness is estimated as too small. Perhaps setting fdw_tuple_cost to , say 1 as an extreme value, will bring victory to remote sort path for the query. > The cause of the misplan seems to be this: postgres_fdw with > use_remote_estimates on does not attempt to obtain fast-start plans from > the remote. In this case what happens is this: > > 1. postgres_fdw gets the cost estimate from the plain remote fetch, by > doing "EXPLAIN select * from table". This produces a plan with a low > startup cost (just the constant overhead) and a high total cost (on > the order of 1.2e6 in this case). > > 2. postgres_fdw gets the cost estimate for the ordered fetch, by doing > "EXPLAIN select * from table order by col". Note that there is no > LIMIT nor any cursor_tuple_fraction in effect, so the plan returned > in this case is a seqscan+sort plan (in spite of the presence of an > index on "col"), with a very high (order of 8e6) startup and total > cost. > > So when the local side tries to generate paths, it has the choice of > using a remote-ordered path with startup cost 8e6, or a local top-1 > sort on top of an unordered remote path, which has a total cost on the > order of 1.5e6 in this case; cheaper than the remote sort because this > only needs to do top-1, while the remote is sorting millions of rows > and would probably spill to disk. A simple test at hand showed that (on a unix-domain connection): =# explain (verbose on, analyze on) select * from ft1 order by a; > Foreign Scan on public.ft1 (cost=9847.82..17097.82 rows=100000 width=4) > (actual time=195.861..515.747 rows=100000 loops=1) =# explain (verbose on, analyze on) select * from ft1; > Foreign Scan on public.ft1 (cost=100.00..8543.00 rows=100000 width=4) > (actual time=0.659..399.427 rows=100000 loops=1) The cost is apaprently wrong. On my environment fdw_startup_cost = 45 and fdw_tuple_cost = 0.2 gave me an even cost/actual time ratio *for these queries*. (hard coded default is 100 and 0.01. Of course this disucussion is ignoring the accuracy of local-execution estimate.) =# explain (verbose on, analyze on) select * from ft1 order by a; > Foreign Scan on public.ft1 (cost=9792.82..31042.82 rows=100000 width=4) > (actual time=201.493..533.913 rows=100000 loops=1) =# explain (verbose on, analyze on) select * from ft1; > Foreign Scan on public.ft1 (cost=45.00..22488.00 rows=100000 width=4) > (actual time=0.837..484.469 rows=100000 loops=1) This gave me a remote-sorted plan for "select * from ft1 order by a limit 1". (But also gave me a remote-sorted plan without a LIMIT..) > However, when it comes to actual execution, postgres_fdw opens a cursor > for the remote query, which means that cursor_tuple_fraction will come > into play. As far as I can tell, this is not set anywhere, so this means > that the plan that actually gets run on the remote is likely to have > _completely_ different costs from those returned by the EXPLAINs. In > particular, in this case the fast-start index-scan plan for the ORDER BY > remote query is clearly being chosen when use_remote_estimates is off > (since the query completes in 15ms rather than 150 seconds). > > One possibility: would it be worth adding an option to EXPLAIN that > makes it assume cursor_tuple_fraction? Cursor fraction seems working since the foreign scan with remote sort has a cost with different startup and total values. The problem seems to be a too-small tuple cost. So, we might have a room for improvement on DEFAULT_FDW_STARTUP_COST, DEFAULT_FDW_TUPLE_COST and DEFAULT_FDW_SORT_MULTIPLIER settings. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > [ postgres_fdw is not smart about exploiting fast-start plans ] Yeah, that's basically not accounted for at all in the current design. > One possibility: would it be worth adding an option to EXPLAIN that > makes it assume cursor_tuple_fraction? [ handwaving ahead ] I wonder whether it could be done without destroying postgres_fdw's support for old servers, by instead including a LIMIT in the query sent for explaining. The trick would be to know what value to put as the limit, though. It'd be easy to do if we were willing to explain the query twice (the second time with a limit chosen as a fraction of the rowcount seen the first time), but man that's an expensive solution. Another component of any real fix here would be to issue "SET cursor_tuple_fraction" before opening the execution cursor, so as to ensure that we actually get an appropriate plan on the remote side. If we could tell whether there's going to be any use in fast-start plans, it might make sense to build two scan paths for a foreign table, one based on a full-table scan and one based on EXPLAIN ... LIMIT 1. This still means two explain requests, which is why I'm not thrilled about doing it unless there's a high probability of the extra explain being useful. regards, tom lane
(2018/08/02 23:41), Tom Lane wrote: > Andrew Gierth<andrew@tao11.riddles.org.uk> writes: >> [ postgres_fdw is not smart about exploiting fast-start plans ] > > Yeah, that's basically not accounted for at all in the current design. > >> One possibility: would it be worth adding an option to EXPLAIN that >> makes it assume cursor_tuple_fraction? > > [ handwaving ahead ] > > I wonder whether it could be done without destroying postgres_fdw's > support for old servers, by instead including a LIMIT in the query sent > for explaining. The trick would be to know what value to put as the > limit, though. It'd be easy to do if we were willing to explain the query > twice (the second time with a limit chosen as a fraction of the rowcount > seen the first time), but man that's an expensive solution. > > Another component of any real fix here would be to issue "SET > cursor_tuple_fraction" before opening the execution cursor, so as to > ensure that we actually get an appropriate plan on the remote side. > > If we could tell whether there's going to be any use in fast-start plans, > it might make sense to build two scan paths for a foreign table, one based > on a full-table scan and one based on EXPLAIN ... LIMIT 1. This still > means two explain requests, which is why I'm not thrilled about doing it > unless there's a high probability of the extra explain being useful. Agreed, but ISTM that to address the original issue, it would be enough to jsut add LIMIT (or ORDER BY LIMIT) pushdown to postgres_fdw based on the upper-planner-pathification work. I'm sorry that I'm really late for the party. Best regards, Etsuro Fujita
(2018/10/05 19:15), Etsuro Fujita wrote: > (2018/08/02 23:41), Tom Lane wrote: >> Andrew Gierth<andrew@tao11.riddles.org.uk> writes: >>> [ postgres_fdw is not smart about exploiting fast-start plans ] >> >> Yeah, that's basically not accounted for at all in the current design. >> >>> One possibility: would it be worth adding an option to EXPLAIN that >>> makes it assume cursor_tuple_fraction? >> >> [ handwaving ahead ] >> >> I wonder whether it could be done without destroying postgres_fdw's >> support for old servers, by instead including a LIMIT in the query sent >> for explaining. The trick would be to know what value to put as the >> limit, though. It'd be easy to do if we were willing to explain the query >> twice (the second time with a limit chosen as a fraction of the rowcount >> seen the first time), but man that's an expensive solution. >> >> Another component of any real fix here would be to issue "SET >> cursor_tuple_fraction" before opening the execution cursor, so as to >> ensure that we actually get an appropriate plan on the remote side. >> >> If we could tell whether there's going to be any use in fast-start plans, >> it might make sense to build two scan paths for a foreign table, one >> based >> on a full-table scan and one based on EXPLAIN ... LIMIT 1. This still >> means two explain requests, which is why I'm not thrilled about doing it >> unless there's a high probability of the extra explain being useful. > > Agreed, but ISTM that to address the original issue, it would be enough > to jsut add LIMIT (or ORDER BY LIMIT) pushdown to postgres_fdw based on > the upper-planner-pathification work. Will work on it unless somebody else wants to. Best regards, Etsuro Fujita
(2018/10/05 19:15), Etsuro Fujita wrote: > (2018/08/02 23:41), Tom Lane wrote: >> Andrew Gierth<andrew@tao11.riddles.org.uk> writes: >>> [ postgres_fdw is not smart about exploiting fast-start plans ] >> >> Yeah, that's basically not accounted for at all in the current design. >> >>> One possibility: would it be worth adding an option to EXPLAIN that >>> makes it assume cursor_tuple_fraction? >> >> [ handwaving ahead ] >> >> I wonder whether it could be done without destroying postgres_fdw's >> support for old servers, by instead including a LIMIT in the query sent >> for explaining. The trick would be to know what value to put as the >> limit, though. It'd be easy to do if we were willing to explain the query >> twice (the second time with a limit chosen as a fraction of the rowcount >> seen the first time), but man that's an expensive solution. >> >> Another component of any real fix here would be to issue "SET >> cursor_tuple_fraction" before opening the execution cursor, so as to >> ensure that we actually get an appropriate plan on the remote side. >> >> If we could tell whether there's going to be any use in fast-start plans, >> it might make sense to build two scan paths for a foreign table, one >> based >> on a full-table scan and one based on EXPLAIN ... LIMIT 1. This still >> means two explain requests, which is why I'm not thrilled about doing it >> unless there's a high probability of the extra explain being useful. > > Agreed, but ISTM that to address the original issue, it would be enough > to jsut add LIMIT (or ORDER BY LIMIT) pushdown to postgres_fdw based on > the upper-planner-pathification work. Will work on it unless somebody else wants to. Best regards, Etsuro Fujita
(2018/10/09 14:48), Etsuro Fujita wrote: > (2018/10/05 19:15), Etsuro Fujita wrote: >> (2018/08/02 23:41), Tom Lane wrote: >>> Andrew Gierth<andrew@tao11.riddles.org.uk> writes: >>>> [ postgres_fdw is not smart about exploiting fast-start plans ] >>> >>> Yeah, that's basically not accounted for at all in the current design. >>> >>>> One possibility: would it be worth adding an option to EXPLAIN that >>>> makes it assume cursor_tuple_fraction? >>> >>> [ handwaving ahead ] >>> >>> I wonder whether it could be done without destroying postgres_fdw's >>> support for old servers, by instead including a LIMIT in the query sent >>> for explaining. The trick would be to know what value to put as the >>> limit, though. It'd be easy to do if we were willing to explain the >>> query >>> twice (the second time with a limit chosen as a fraction of the rowcount >>> seen the first time), but man that's an expensive solution. >>> >>> Another component of any real fix here would be to issue "SET >>> cursor_tuple_fraction" before opening the execution cursor, so as to >>> ensure that we actually get an appropriate plan on the remote side. >>> >>> If we could tell whether there's going to be any use in fast-start >>> plans, >>> it might make sense to build two scan paths for a foreign table, one >>> based >>> on a full-table scan and one based on EXPLAIN ... LIMIT 1. This still >>> means two explain requests, which is why I'm not thrilled about doing it >>> unless there's a high probability of the extra explain being useful. >> >> Agreed, but ISTM that to address the original issue, it would be enough >> to jsut add LIMIT (or ORDER BY LIMIT) pushdown to postgres_fdw based on >> the upper-planner-pathification work. > > Will work on it unless somebody else wants to. Here is a set of WIP patches for pushing down ORDER BY LIMIT to the remote: * 0001-postgres-fdw-upperrel-ordered-WIP.patch: This patch performs the UPPERREL_ORDERED step remotely. * 0002-postgres-fdw-upperrel-final-WIP.patch: This patch performs the UPPERREL_FINAL step remotely. Currently, this only supports for SELECT commands, and pushes down LIMIT/OFFSET to the remote if possible. This also removes LockRows if there is a FOR UPDATE/SHARE clause, which would be safe because postgres_fdw performs early locking. I'd like to leave INSERT, UPDATE and DELETE cases for future work. It is my long-term todo to rewrite PlanDirectModify using the upper-planner-pathification work. :) For some regression test cases with ORDER BY and/or LIMIT, I noticed that these patches still cannot push down those clause to the remote. I guess it would be needed to tweak the cost/size estimation added by these patches, but I didn't look at that in detail yet. Maybe I'm missing something, though. Comments welcome! Best regards, Etsuro Fujita
Attachment
(2018/12/17 22:09), Etsuro Fujita wrote: > Here is a set of WIP patches for pushing down ORDER BY LIMIT to the remote: > > * 0001-postgres-fdw-upperrel-ordered-WIP.patch: Correction: 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-WIP.patch > * 0002-postgres-fdw-upperrel-final-WIP.patch: Correction: 0002-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely-WIP.patch Best regards, Etsuro Fujita
(2018/12/17 22:09), Etsuro Fujita wrote: > Here is a set of WIP patches for pushing down ORDER BY LIMIT to the remote: > For some regression test cases with ORDER BY and/or LIMIT, I noticed > that these patches still cannot push down those clause to the remote. I > guess it would be needed to tweak the cost/size estimation added by > these patches, but I didn't look at that in detail yet. Maybe I'm > missing something, though. I looked into that. In the previous patch, I estimated costs for performing grouping/aggregation with ORDER BY remotely, using the same heuristic as scan/join cases if appropriate (i.e., I assumed that a remote sort for grouping/aggregation also costs 20% extra), but that seems too large for grouping/aggregation. So I reduced that to 5% extra. The result showed that some test cases can be pushed down, but some still cannot. I think we could push down the ORDER BY in all cases by reducing the extra cost to a much smaller value, but I'm not sure what value is reasonable. Attached is an updated version of the patch. Other changes: * Modified estimate_path_cost_size() further so that it accounts for tlist eval cost as well * Added more comments * Simplified code a little bit I'll add this to the upcoming CF. Best regards, Etsuro Fujita
Attachment
(2018/12/26 16:35), Etsuro Fujita wrote: > Attached is an updated version of the patch. Other changes: While self-reviewing the patch I noticed a thinko in the patch 0001 for pushing down the final sort: I estimated the costs for that using estimate_path_cost_size that was modified so that it factored the limit_tuples limit (if any) into the costs, but I think that was wrong; that should not factor that because the remote query corresponding to the pushdown step won't have LIMIT. So I fixed that. Also, a new data structure I added to include/nodes/relation.h (ie, OrderedPathExtraData) is no longer needed, so I removed that. Attached is a new version of the patch. Best regards, Etsuro Fujita
Attachment
(2018/12/28 15:50), Etsuro Fujita wrote: > Attached is a new version of the > patch. Here is an updated version of the patch set. Changes are: * In the previous version, LIMIT without OFFSET was not performed remotely as the costs was calculated the same as the costs of performing it locally. (Actually, such LIMIT was performed remotely in a case in the postgres_fdw regression test, but that was due to a bug :(.) I think we should prefer performing such LIMIT remotely as that avoids extra row fetches from the remote that performing it locally might cause, so I tweaked the costs estimated in estimate_path_cost_size(), to ensure that we'll prefer performing such LIMIT remotely. (I fixed the mentioned bug as well.) * I fixed another bug in adjusting the costs of pre-sorted foreign-grouping paths. * I tweaked comments, and rebased the patch set against the latest HEAD. Best regards, Etsuro Fujita
Attachment
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/12/28 15:50), Etsuro Fujita wrote: > > Attached is a new version of the > > patch. > > Here is an updated version of the patch set. Changes are: I've spent some time reviewing the patches. First, I wonder if the information on LIMIT needs to be passed to the FDW. Cannot the FDW routines use root->tuple_fraction? I think (although have not verified experimentally) that the problem reported in [1] is not limited to the LIMIT clause. I think the same problem can happen if client uses cursor and sets cursor_tuple_fraction very low. Since the remote node is not aware of this setting when running EXPLAIN, the low fraction can also make postgres_fdw think that retrieval of the whole set is cheaper than it will appear to be during the actual execution. Some comments on coding: 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch ----------------------------------------------------------------- * I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that grouping_planner() does not call create_limit_path() until it's done with create_ordered_paths(), and create_ordered_paths() is where the FDW is requested to add its paths to UPPERREL_ORDERED relation. * add_foreign_ordered_paths() Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the target. I think create_ordered_paths() should actually set the target so that postgresGetForeignUpperPaths() can simply find it in output_rel->reltarget. This is how postgres_fdw already retrieves the target for grouped paths. Small patch is attached that makes this possible. * regression tests: I think test(s) should be added for queries that have ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET) clause. I haven't noticed such tests. 0002-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely-v3.patch --------------------------------------------------------------- * add_foreign_final_paths() Similar problem I reported for add_foreign_ordered_paths() above. * adjust_limit_rows_costs() Doesn't this function address more generic problem than the use of postgres_fdw? If so, then it should be submitted as a separate patch. BTW, the function is only used in pathnode.c, so it should be declared static. Both patches: ------------ One thing that makes me confused when I try to understand details is that the new functions add_foreign_ordered_paths() and add_foreign_final_paths() both pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size(): estimate_path_cost_size(root, input_rel, ...) whereas the existing add_foreign_grouping_paths() passes "grouped_rel": estimate_path_cost_size(root, grouped_rel, ...) Can you please make this consistent? If you do, you'll probably need to reconsider your changes to estimate_path_cost_size(). And maybe related problem: why should FDW care about the effect of apply_scanjoin_target_to_paths() like you claim to do here? /* * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the * final scan/join relation, the costs obtained from the cache * wouldn't yet contain the eval cost for the final scan/join target, * which would have been updated by apply_scanjoin_target_to_paths(); * add the eval cost now. */ if (fpextra && !IS_UPPER_REL(foreignrel)) { /* The costs should have been obtained from the cache. */ Assert(fpinfo->rel_startup_cost >= 0 && fpinfo->rel_total_cost >= 0); startup_cost += foreignrel->reltarget->cost.startup; run_cost += foreignrel->reltarget->cost.per_tuple * rows; } I think it should not, whether "foreignrel" means "input_rel" or "output_rel" from the perspective of postgresGetForeignUpperPaths(). [1] https://www.postgresql.org/message-id/87pnz1aby9.fsf@news-spur.riddles.org.uk -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index b2239728cf..17e983f11e 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -2035,6 +2035,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, * of the corresponding upperrels might not be needed for this query. */ root->upper_targets[UPPERREL_FINAL] = final_target; + root->upper_targets[UPPERREL_ORDERED] = final_target; root->upper_targets[UPPERREL_WINDOW] = sort_input_target; root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; @@ -2118,6 +2119,12 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL); /* + * Set reltarget so that it's consistent with the paths. Also it's more + * convenient for FDW to find the target here. + */ + final_rel->reltarget = final_target; + + /* * If the input rel is marked consider_parallel and there's nothing that's * not parallel-safe in the LIMIT clause, then the final_rel can be marked * consider_parallel as well. Note that if the query has rowMarks or is @@ -4865,6 +4872,12 @@ create_ordered_paths(PlannerInfo *root, ordered_rel = fetch_upper_rel(root, UPPERREL_ORDERED, NULL); /* + * Set reltarget so that it's consistent with the paths. Also it's more + * convenient for FDW to find the target here. + */ + ordered_rel->reltarget = target; + + /* * If the input relation is not parallel-safe, then the ordered relation * can't be parallel-safe, either. Otherwise, it's parallel-safe if the * target list is parallel-safe.
Hi Antonin, (2019/02/08 2:04), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> Here is an updated version of the patch set. Changes are: > > I've spent some time reviewing the patches. Thanks for the review! > First, I wonder if the information on LIMIT needs to be passed to the > FDW. Cannot the FDW routines use root->tuple_fraction? I think it's OK for the FDW to use the tuple_fraction, but I'm not sure the tuple_fraction should be enough. For example, I don't think we can re-compute from the tuple_fraction the LIMIT/OFFSET values. > I think (although have > not verified experimentally) that the problem reported in [1] is not limited > to the LIMIT clause. I think the same problem can happen if client uses cursor > and sets cursor_tuple_fraction very low. Since the remote node is not aware of > this setting when running EXPLAIN, the low fraction can also make postgres_fdw > think that retrieval of the whole set is cheaper than it will appear to be > during the actual execution. I think it would be better to get a fast-start plan on the remote side in such a situation, but I'm not sure we can do that as well with this patch, because in the cursor case, the FDW couldn't know in advance exactly how may rows will be fetched from the remote side using that cursor, so the FDW couldn't push down LIMIT. So I'd like to leave that for another patch. > Some comments on coding: > > 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch > ----------------------------------------------------------------- > > * I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that > grouping_planner() does not call create_limit_path() until it's done with > create_ordered_paths(), and create_ordered_paths() is where the FDW is > requested to add its paths to UPPERREL_ORDERED relation. Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT at all. I added the parameter limit_tuples to PgFdwPathExtraData so that we can pass limit_tuples=-1, which means no LIMIT, to cost_sort(), though. > * add_foreign_ordered_paths() > > Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the target. > > I think create_ordered_paths() should actually set the target so that > postgresGetForeignUpperPaths() can simply find it in > output_rel->reltarget. This is how postgres_fdw already retrieves the target > for grouped paths. Small patch is attached that makes this possible. Seems like a good idea. Thanks for the patch! Will review. > * regression tests: I think test(s) should be added for queries that have > ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET) > clause. I haven't noticed such tests. Will do. > 0002-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely-v3.patch > --------------------------------------------------------------- > > * add_foreign_final_paths() > > Similar problem I reported for add_foreign_ordered_paths() above. > > * adjust_limit_rows_costs() > > Doesn't this function address more generic problem than the use of > postgres_fdw? If so, then it should be submitted as a separate patch. BTW, the > function is only used in pathnode.c, so it should be declared static. Actually, this is simple refactoring for create_limit_path() so that that function can be shared with postgres_fdw. See estimate_path_cost_size(). I'll separate that refactoring in the next version of the patch set. > Both patches: > ------------ > > One thing that makes me confused when I try to understand details is that the > new functions add_foreign_ordered_paths() and add_foreign_final_paths() both > pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size(): > > estimate_path_cost_size(root, input_rel, ...) > > whereas the existing add_foreign_grouping_paths() passes "grouped_rel": > > estimate_path_cost_size(root, grouped_rel, ...) > > Can you please make this consistent? If you do, you'll probably need to > reconsider your changes to estimate_path_cost_size(). This is intended and I think I should add more comments on that. Let me explain. These patches modified estimate_path_cost_size() so that we can estimate the costs of a foreign path for the UPPERREL_ORDERED or UPPERREL_FINAL rel, by adding the costs of the upper-relation processing (ie, ORDER BY and/or LIMIT/OFFSET, specified by PgFdwPathExtraData) to the costs of implementing the *underlying* relation for the upper relation (ie, scan, join or grouping rel, specified by the input_rel). So those functions call estimate_path_cost_size() with the input_rel, not the ordered_rel/final_rel, along with PgFdwPathExtraData. > And maybe related problem: why should FDW care about the effect of > apply_scanjoin_target_to_paths() like you claim to do here? > > /* > * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the > * final scan/join relation, the costs obtained from the cache > * wouldn't yet contain the eval cost for the final scan/join target, > * which would have been updated by apply_scanjoin_target_to_paths(); > * add the eval cost now. > */ > if (fpextra&& !IS_UPPER_REL(foreignrel)) > { > /* The costs should have been obtained from the cache. */ > Assert(fpinfo->rel_startup_cost>= 0&& > fpinfo->rel_total_cost>= 0); > > startup_cost += foreignrel->reltarget->cost.startup; > run_cost += foreignrel->reltarget->cost.per_tuple * rows; > } > > I think it should not, whether "foreignrel" means "input_rel" or "output_rel" > from the perspective of postgresGetForeignUpperPaths(). I added this before costing the sort operation below, because 1) the cost of evaluating the scan/join target might not be zero (consider the case where sort columns are not simple Vars, for example) and 2) the cost of sorting takes into account the underlying path's startup/total costs. Maybe I'm missing something, though. Thanks again! Best regards, Etsuro Fujita
(2019/02/08 21:35), Etsuro Fujita wrote: > (2019/02/08 2:04), Antonin Houska wrote: >> * add_foreign_ordered_paths() >> >> Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the >> target. >> >> I think create_ordered_paths() should actually set the target so that >> postgresGetForeignUpperPaths() can simply find it in >> output_rel->reltarget. This is how postgres_fdw already retrieves the >> target >> for grouped paths. Small patch is attached that makes this possible. > > Seems like a good idea. Thanks for the patch! Will review. Here are my review comments: root->upper_targets[UPPERREL_FINAL] = final_target; + root->upper_targets[UPPERREL_ORDERED] = final_target; root->upper_targets[UPPERREL_WINDOW] = sort_input_target; root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; I think that's a good idea. I think we'll soon need the PathTarget for UPPERREL_DISTINCT, so how about saving that as well like this? root->upper_targets[UPPERREL_FINAL] = final_target; + root->upper_targets[UPPERREL_ORDERED] = final_target; + root->upper_targets[UPPERREL_DISTINCT] = sort_input_target; root->upper_targets[UPPERREL_WINDOW] = sort_input_target; root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; Another is about this: /* + * Set reltarget so that it's consistent with the paths. Also it's more + * convenient for FDW to find the target here. + */ + ordered_rel->reltarget = target; I think this is correct if there are no SRFs in the targetlist, but otherwise I'm not sure this is really correct because the relation's reltarget would not match the target of paths added to the relation after adjust_paths_for_srfs(). Having said that, my question here is: do we really need this (and the same for UPPERREL_FINAL you added)? For the purpose of the FDW convenience, the upper_targets[] change seems to me to be enough. Best regards, Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/02/08 2:04), Antonin Houska wrote: > > > First, I wonder if the information on LIMIT needs to be passed to the > > FDW. Cannot the FDW routines use root->tuple_fraction? > > I think it's OK for the FDW to use the tuple_fraction, but I'm not sure the > tuple_fraction should be enough. For example, I don't think we can re-compute > from the tuple_fraction the LIMIT/OFFSET values. > > > I think (although have > > not verified experimentally) that the problem reported in [1] is not limited > > to the LIMIT clause. I think the same problem can happen if client uses cursor > > and sets cursor_tuple_fraction very low. Since the remote node is not aware of > > this setting when running EXPLAIN, the low fraction can also make postgres_fdw > > think that retrieval of the whole set is cheaper than it will appear to be > > during the actual execution. > > I think it would be better to get a fast-start plan on the remote side in such > a situation, but I'm not sure we can do that as well with this patch, because > in the cursor case, the FDW couldn't know in advance exactly how may rows will > be fetched from the remote side using that cursor, so the FDW couldn't push > down LIMIT. So I'd like to leave that for another patch. root->tuple_fraction (possibly derived from cursor_tuple_fraction) should be enough to decide whether fast-startup plan should be considered. I just failed to realize that your patch primarily aims at the support of UPPERREL_ORDERED and UPPERREL_FINAL relations by postgres_fdw. Yes, another patch would be needed to completely resolve the problem reported by Andrew Gierth upthread. > > Some comments on coding: > > > > 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch > > ----------------------------------------------------------------- > > > > * I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that > > grouping_planner() does not call create_limit_path() until it's done with > > create_ordered_paths(), and create_ordered_paths() is where the FDW is > > requested to add its paths to UPPERREL_ORDERED relation. > > Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT at > all. I added the parameter limit_tuples to PgFdwPathExtraData so that we can > pass limit_tuples=-1, which means no LIMIT, to cost_sort(), though. Yes, the limit_tuples field made me confused. But if cost_sort() is the only reason, why not simply pass -1 in the 0001 patch, and use the limit_tuples argument only in 0002? I see several other calls of cost_sort() in the core where -1 is hard-coded. > > Both patches: > > ------------ > > > > One thing that makes me confused when I try to understand details is that the > > new functions add_foreign_ordered_paths() and add_foreign_final_paths() both > > pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size(): > > > > estimate_path_cost_size(root, input_rel, ...) > > > > whereas the existing add_foreign_grouping_paths() passes "grouped_rel": > > > > estimate_path_cost_size(root, grouped_rel, ...) > > > > Can you please make this consistent? If you do, you'll probably need to > > reconsider your changes to estimate_path_cost_size(). > > This is intended and I think I should add more comments on that. Let me > explain. These patches modified estimate_path_cost_size() so that we can > estimate the costs of a foreign path for the UPPERREL_ORDERED or > UPPERREL_FINAL rel, by adding the costs of the upper-relation processing (ie, > ORDER BY and/or LIMIT/OFFSET, specified by PgFdwPathExtraData) to the costs of > implementing the *underlying* relation for the upper relation (ie, scan, join > or grouping rel, specified by the input_rel). So those functions call > estimate_path_cost_size() with the input_rel, not the ordered_rel/final_rel, > along with PgFdwPathExtraData. I think the same already happens for the UPPERREL_GROUP_AGG relation: estimate_path_cost_size() ... else if (IS_UPPER_REL(foreignrel)) { ... ofpinfo = (PgFdwRelationInfo *) fpinfo->outerrel->fdw_private; Here "outerrel" actually means input relation from the grouping perspective. The input relation (i.e. result of scan / join) estimates are retrieved from "ofpinfo" and the costs of aggregation are added. Why should this be different for other kinds of upper relations? > > And maybe related problem: why should FDW care about the effect of > > apply_scanjoin_target_to_paths() like you claim to do here? > > > > /* > > * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the > > * final scan/join relation, the costs obtained from the cache > > * wouldn't yet contain the eval cost for the final scan/join target, > > * which would have been updated by apply_scanjoin_target_to_paths(); > > * add the eval cost now. > > */ > > if (fpextra&& !IS_UPPER_REL(foreignrel)) > > { > > /* The costs should have been obtained from the cache. */ > > Assert(fpinfo->rel_startup_cost>= 0&& > > fpinfo->rel_total_cost>= 0); > > > > startup_cost += foreignrel->reltarget->cost.startup; > > run_cost += foreignrel->reltarget->cost.per_tuple * rows; > > } > > > > I think it should not, whether "foreignrel" means "input_rel" or "output_rel" > > from the perspective of postgresGetForeignUpperPaths(). > > I added this before costing the sort operation below, because 1) the cost of > evaluating the scan/join target might not be zero (consider the case where > sort columns are not simple Vars, for example) and 2) the cost of sorting > takes into account the underlying path's startup/total costs. Maybe I'm > missing something, though. My understanding is that the "input_rel" argument of postgresGetForeignUpperPaths() should already have been processed by postgresGetForeignPaths() or postgresGetForeignJoinPaths() (or actually by postgresGetForeignUpperPaths() called with different "stage" too). Therefore it should already have the "fdw_private" field initialized. The estimates in the corresponding PgFdwRelationInfo should already include the reltarget costs, as set previously by estimate_path_cost_size(). -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/02/08 21:35), Etsuro Fujita wrote: > > (2019/02/08 2:04), Antonin Houska wrote: > >> * add_foreign_ordered_paths() > >> > >> Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the > >> target. > >> > >> I think create_ordered_paths() should actually set the target so that > >> postgresGetForeignUpperPaths() can simply find it in > >> output_rel->reltarget. This is how postgres_fdw already retrieves the > >> target > >> for grouped paths. Small patch is attached that makes this possible. > > > > Seems like a good idea. Thanks for the patch! Will review. > > Here are my review comments: > > root->upper_targets[UPPERREL_FINAL] = final_target; > + root->upper_targets[UPPERREL_ORDERED] = final_target; > root->upper_targets[UPPERREL_WINDOW] = sort_input_target; > root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; > > I think that's a good idea. I think we'll soon need the PathTarget for > UPPERREL_DISTINCT, so how about saving that as well like this? > > root->upper_targets[UPPERREL_FINAL] = final_target; > + root->upper_targets[UPPERREL_ORDERED] = final_target; > + root->upper_targets[UPPERREL_DISTINCT] = sort_input_target; > root->upper_targets[UPPERREL_WINDOW] = sort_input_target; > root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; I see nothing wrong about this. > Another is about this: > > /* > + * Set reltarget so that it's consistent with the paths. Also it's more > + * convenient for FDW to find the target here. > + */ > + ordered_rel->reltarget = target; > > I think this is correct if there are no SRFs in the targetlist, but otherwise > I'm not sure this is really correct because the relation's reltarget would not > match the target of paths added to the relation after adjust_paths_for_srfs(). I wouldn't expect problem here. create_grouping_paths() also sets the reltarget although adjust_paths_for_srfs() can be alled on grouped_rel afterwards. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
(2019/02/12 20:43), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> Here are my review comments: >> >> root->upper_targets[UPPERREL_FINAL] = final_target; >> + root->upper_targets[UPPERREL_ORDERED] = final_target; >> root->upper_targets[UPPERREL_WINDOW] = sort_input_target; >> root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; >> >> I think that's a good idea. I think we'll soon need the PathTarget for >> UPPERREL_DISTINCT, so how about saving that as well like this? >> >> root->upper_targets[UPPERREL_FINAL] = final_target; >> + root->upper_targets[UPPERREL_ORDERED] = final_target; >> + root->upper_targets[UPPERREL_DISTINCT] = sort_input_target; >> root->upper_targets[UPPERREL_WINDOW] = sort_input_target; >> root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; > > I see nothing wrong about this. Cool! >> Another is about this: >> >> /* >> + * Set reltarget so that it's consistent with the paths. Also it's more >> + * convenient for FDW to find the target here. >> + */ >> + ordered_rel->reltarget = target; >> >> I think this is correct if there are no SRFs in the targetlist, but otherwise >> I'm not sure this is really correct because the relation's reltarget would not >> match the target of paths added to the relation after adjust_paths_for_srfs(). > > I wouldn't expect problem here. create_grouping_paths() also sets the > reltarget although adjust_paths_for_srfs() can be alled on grouped_rel > afterwards. Yeah, but as I said in a previous email, I think the root->upper_targets change would be enough at least currently for FDWs to consider performing post-UPPERREL_GROUP_AGG steps remotely, as shown in my patches, so I'm inclined to leave the retarget change for future work. Best regards, Etsuro Fujita
(2019/02/12 18:03), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> (2019/02/08 2:04), Antonin Houska wrote: >>> Some comments on coding: >>> >>> 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch >>> ----------------------------------------------------------------- >>> >>> * I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that >>> grouping_planner() does not call create_limit_path() until it's done with >>> create_ordered_paths(), and create_ordered_paths() is where the FDW is >>> requested to add its paths to UPPERREL_ORDERED relation. >> >> Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT at >> all. I added the parameter limit_tuples to PgFdwPathExtraData so that we can >> pass limit_tuples=-1, which means no LIMIT, to cost_sort(), though. > > Yes, the limit_tuples field made me confused. But if cost_sort() is the only > reason, why not simply pass -1 in the 0001 patch, and use the limit_tuples > argument only in 0002? I see several other calls of cost_sort() in the core > where -1 is hard-coded. Yeah, that would make it easy to review the patch; will do. >>> Both patches: >>> ------------ >>> >>> One thing that makes me confused when I try to understand details is that the >>> new functions add_foreign_ordered_paths() and add_foreign_final_paths() both >>> pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size(): >>> >>> estimate_path_cost_size(root, input_rel, ...) >>> >>> whereas the existing add_foreign_grouping_paths() passes "grouped_rel": >>> >>> estimate_path_cost_size(root, grouped_rel, ...) >>> >>> Can you please make this consistent? If you do, you'll probably need to >>> reconsider your changes to estimate_path_cost_size(). >> >> This is intended and I think I should add more comments on that. Let me >> explain. These patches modified estimate_path_cost_size() so that we can >> estimate the costs of a foreign path for the UPPERREL_ORDERED or >> UPPERREL_FINAL rel, by adding the costs of the upper-relation processing (ie, >> ORDER BY and/or LIMIT/OFFSET, specified by PgFdwPathExtraData) to the costs of >> implementing the *underlying* relation for the upper relation (ie, scan, join >> or grouping rel, specified by the input_rel). So those functions call >> estimate_path_cost_size() with the input_rel, not the ordered_rel/final_rel, >> along with PgFdwPathExtraData. > > I think the same already happens for the UPPERREL_GROUP_AGG relation: > estimate_path_cost_size() > > ... > else if (IS_UPPER_REL(foreignrel)) > { > ... > ofpinfo = (PgFdwRelationInfo *) fpinfo->outerrel->fdw_private; > > Here "outerrel" actually means input relation from the grouping perspective. > The input relation (i.e. result of scan / join) estimates are retrieved from > "ofpinfo" and the costs of aggregation are added. Why should this be different > for other kinds of upper relations? IIUC, I think there is an oversight in that case: the cost estimation doesn't account for evaluating the final scan/join target updated by apply_scanjoin_target_to_paths(), but I think that is another issue, so I'll start a new thread. >>> And maybe related problem: why should FDW care about the effect of >>> apply_scanjoin_target_to_paths() like you claim to do here? >>> >>> /* >>> * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the >>> * final scan/join relation, the costs obtained from the cache >>> * wouldn't yet contain the eval cost for the final scan/join target, >>> * which would have been updated by apply_scanjoin_target_to_paths(); >>> * add the eval cost now. >>> */ >>> if (fpextra&& !IS_UPPER_REL(foreignrel)) >>> { >>> /* The costs should have been obtained from the cache. */ >>> Assert(fpinfo->rel_startup_cost>= 0&& >>> fpinfo->rel_total_cost>= 0); >>> >>> startup_cost += foreignrel->reltarget->cost.startup; >>> run_cost += foreignrel->reltarget->cost.per_tuple * rows; >>> } >>> >>> I think it should not, whether "foreignrel" means "input_rel" or "output_rel" >>> from the perspective of postgresGetForeignUpperPaths(). >> >> I added this before costing the sort operation below, because 1) the cost of >> evaluating the scan/join target might not be zero (consider the case where >> sort columns are not simple Vars, for example) and 2) the cost of sorting >> takes into account the underlying path's startup/total costs. Maybe I'm >> missing something, though. > > My understanding is that the "input_rel" argument of > postgresGetForeignUpperPaths() should already have been processed by > postgresGetForeignPaths() or postgresGetForeignJoinPaths() (or actually by > postgresGetForeignUpperPaths() called with different "stage" too). Therefore > it should already have the "fdw_private" field initialized. The estimates in > the corresponding PgFdwRelationInfo should already include the reltarget > costs, as set previously by estimate_path_cost_size(). Right, but what I'm saying here is: the costs of evaluating the final scan/join target updated by apply_scanjoin_target_to_paths() wouldn't be yet included in the costs we calculated using local statistics when called from postgresGetForeignPaths() or postgresGetForeignJoinPaths(). Let me explain using an example: SELECT a+b FROM foreign_table LIMIT 10; For this query, the reltarget for the foreign_table would be {a, b} when called from postgresGetForeignPaths() (see build_base_rel_tlists()). The costs of evaluating simple Vars are zeroes, so we wouldn't charge any costs for tlist evaluation when estimating the costs of a basic foreign table scan in that callback routine, but the final scan target, with which the reltarget will be replaced later by apply_scanjoin_target_to_paths(), would be {a+b}, so we need to adjust the costs of the basic foreign table scan, cached in the PgFdwRelationInfo for the input_rel (ie, the foreign_table), so that eval costs for the replaced tlist are included when called from postgresGetForeignUpperPaths() with the UPPERREL_FINAL stage, as the costs of evaluating the expression 'a+b' wouldn't be zeroes. Best regards, Etsuro Fujita
(2019/02/14 18:44), Etsuro Fujita wrote: > (2019/02/12 20:43), Antonin Houska wrote: >> Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >>> Here are my review comments: >>> >>> root->upper_targets[UPPERREL_FINAL] = final_target; >>> + root->upper_targets[UPPERREL_ORDERED] = final_target; >>> root->upper_targets[UPPERREL_WINDOW] = sort_input_target; >>> root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; >>> >>> I think that's a good idea. I think we'll soon need the PathTarget for >>> UPPERREL_DISTINCT, so how about saving that as well like this? >>> >>> root->upper_targets[UPPERREL_FINAL] = final_target; >>> + root->upper_targets[UPPERREL_ORDERED] = final_target; >>> + root->upper_targets[UPPERREL_DISTINCT] = sort_input_target; >>> root->upper_targets[UPPERREL_WINDOW] = sort_input_target; >>> root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target; >> >> I see nothing wrong about this. > > Cool! > >>> Another is about this: >>> >>> /* >>> + * Set reltarget so that it's consistent with the paths. Also it's more >>> + * convenient for FDW to find the target here. >>> + */ >>> + ordered_rel->reltarget = target; >>> >>> I think this is correct if there are no SRFs in the targetlist, but >>> otherwise >>> I'm not sure this is really correct because the relation's reltarget >>> would not >>> match the target of paths added to the relation after >>> adjust_paths_for_srfs(). >> >> I wouldn't expect problem here. create_grouping_paths() also sets the >> reltarget although adjust_paths_for_srfs() can be alled on grouped_rel >> afterwards. > > Yeah, but as I said in a previous email, I think the root->upper_targets > change would be enough at least currently for FDWs to consider > performing post-UPPERREL_GROUP_AGG steps remotely, as shown in my > patches, so I'm inclined to leave the retarget change for future work. So, here is an updated patch. If there are no objections from you or anyone else, I'll commit the patch as a preliminary one for what's proposed in this thread. Best regards, Etsuro Fujita
Attachment
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/02/12 18:03), Antonin Houska wrote: > > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: > >> (2019/02/08 2:04), Antonin Houska wrote: > >>> > >>> And maybe related problem: why should FDW care about the effect of > >>> apply_scanjoin_target_to_paths() like you claim to do here? > >>> > >>> /* > >>> * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the > >>> * final scan/join relation, the costs obtained from the cache > >>> * wouldn't yet contain the eval cost for the final scan/join target, > >>> * which would have been updated by apply_scanjoin_target_to_paths(); > >>> * add the eval cost now. > >>> */ > >>> if (fpextra&& !IS_UPPER_REL(foreignrel)) > >>> { > >>> /* The costs should have been obtained from the cache. */ > >>> Assert(fpinfo->rel_startup_cost>= 0&& > >>> fpinfo->rel_total_cost>= 0); > >>> > >>> startup_cost += foreignrel->reltarget->cost.startup; > >>> run_cost += foreignrel->reltarget->cost.per_tuple * rows; > >>> } > >>> > >>> I think it should not, whether "foreignrel" means "input_rel" or "output_rel" > >>> from the perspective of postgresGetForeignUpperPaths(). > >> > >> I added this before costing the sort operation below, because 1) the cost of > >> evaluating the scan/join target might not be zero (consider the case where > >> sort columns are not simple Vars, for example) and 2) the cost of sorting > >> takes into account the underlying path's startup/total costs. Maybe I'm > >> missing something, though. > > > > My understanding is that the "input_rel" argument of > > postgresGetForeignUpperPaths() should already have been processed by > > postgresGetForeignPaths() or postgresGetForeignJoinPaths() (or actually by > > postgresGetForeignUpperPaths() called with different "stage" too). Therefore > > it should already have the "fdw_private" field initialized. The estimates in > > the corresponding PgFdwRelationInfo should already include the reltarget > > costs, as set previously by estimate_path_cost_size(). > > Right, but what I'm saying here is: the costs of evaluating the final > scan/join target updated by apply_scanjoin_target_to_paths() wouldn't be yet > included in the costs we calculated using local statistics when called from > postgresGetForeignPaths() or postgresGetForeignJoinPaths(). Let me explain > using an example: > > SELECT a+b FROM foreign_table LIMIT 10; > > For this query, the reltarget for the foreign_table would be {a, b} when > called from postgresGetForeignPaths() (see build_base_rel_tlists()). The costs > of evaluating simple Vars are zeroes, so we wouldn't charge any costs for > tlist evaluation when estimating the costs of a basic foreign table scan in > that callback routine, but the final scan target, with which the reltarget > will be replaced later by apply_scanjoin_target_to_paths(), would be {a+b}, so > we need to adjust the costs of the basic foreign table scan, cached in the > PgFdwRelationInfo for the input_rel (ie, the foreign_table), so that eval > costs for the replaced tlist are included when called from > postgresGetForeignUpperPaths() with the UPPERREL_FINAL stage, as the costs of > evaluating the expression 'a+b' wouldn't be zeroes. ok, I understand now. I assume that the patch https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp obsoletes the code snippet we discussed above. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
(2019/02/15 21:19), Etsuro Fujita wrote: > So, here is an updated patch. If there are no objections from you or > anyone else, I'll commit the patch as a preliminary one for what's > proposed in this thread. Pushed, after fiddling with the commit message a bit. Best regards, Etsuro Fujita
(2019/02/15 21:46), Antonin Houska wrote: > ok, I understand now. I assume that the patch > > https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp > > obsoletes the code snippet we discussed above. Sorry, I don't understand this. Could you elaborate on that? Best regards, Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/02/15 21:46), Antonin Houska wrote: > > ok, I understand now. I assume that the patch > > > > https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp > > > > obsoletes the code snippet we discussed above. > > Sorry, I don't understand this. Could you elaborate on that? I thought that the assignments + startup_cost += outerrel->reltarget->cost.startup; and + run_cost += outerrel->reltarget->cost.per_tuple * input_rows; in the patch you posted in https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp do replace + startup_cost += foreignrel->reltarget->cost.startup; and + run_cost += foreignrel->reltarget->cost.per_tuple * rows; respectively, which you originally included in the following part of 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch: @@ -2829,6 +2872,22 @@ estimate_path_cost_size(PlannerInfo *root, run_cost += foreignrel->reltarget->cost.per_tuple * rows; } + /* + * If this is an UPPERREL_ORDERED step on the final scan/join + * relation, the costs obtained from the cache wouldn't yet contain + * the eval cost for the final scan/join target, which would have been + * updated by apply_scanjoin_target_to_paths(); add the eval cost now. + */ + if (fpextra && !IS_UPPER_REL(foreignrel)) + { + /* The costs should have been obtained from the cache. */ + Assert(fpinfo->rel_startup_cost >= 0 && + fpinfo->rel_total_cost >= 0); + + startup_cost += foreignrel->reltarget->cost.startup; + run_cost += foreignrel->reltarget->cost.per_tuple * rows; + } + /* * Without remote estimates, we have no real way to estimate the cost * of generating sorted output. It could be free if the query plan -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
(2019/02/18 23:21), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> (2019/02/15 21:46), Antonin Houska wrote: >>> ok, I understand now. I assume that the patch >>> >>> https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp >>> >>> obsoletes the code snippet we discussed above. >> >> Sorry, I don't understand this. Could you elaborate on that? > > I thought that the assignments > > + startup_cost += outerrel->reltarget->cost.startup; > > and > > + run_cost += outerrel->reltarget->cost.per_tuple * input_rows; > > in the patch you posted in > https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp do > replace > > + startup_cost += foreignrel->reltarget->cost.startup; > > and > > + run_cost += foreignrel->reltarget->cost.per_tuple * rows; > > respectively, which you originally included in the following part of > 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch: > > @@ -2829,6 +2872,22 @@ estimate_path_cost_size(PlannerInfo *root, > run_cost += foreignrel->reltarget->cost.per_tuple * rows; > } > > + /* > + * If this is an UPPERREL_ORDERED step on the final scan/join > + * relation, the costs obtained from the cache wouldn't yet contain > + * the eval cost for the final scan/join target, which would have been > + * updated by apply_scanjoin_target_to_paths(); add the eval cost now. > + */ > + if (fpextra&& !IS_UPPER_REL(foreignrel)) > + { > + /* The costs should have been obtained from the cache. */ > + Assert(fpinfo->rel_startup_cost>= 0&& > + fpinfo->rel_total_cost>= 0); > + > + startup_cost += foreignrel->reltarget->cost.startup; > + run_cost += foreignrel->reltarget->cost.per_tuple * rows; > + } > + Maybe, my explanation in that thread was not enough, but the changes proposed by the patch posted there wouldn't obsolete the above. Let me explain using a foreign-table variant of the example shown in the comments for make_group_input_target(): SELECT a+b, sum(c+d) FROM foreign_table GROUP BY a+b; When called from postgresGetForeignPaths(), the reltarget for the base relation foreign_table would be {a, b, c, d}, for the same reason as the LIMIT example in [1], and the foreign_table's rel_startup_cost and rel_total_cost would be calculated based on this reltarget in that callback routine. (The tlist eval costs would be zeroes, though). BUT: before called from postgresGetForeignUpperPaths() with the UPPERREL_GROUP_AGG stage, the reltarget would be replaced with {a+b, c, d}, which is the final scan target as explained in the comments for make_group_input_target(), and the eval costs of the new reltarget wouldn't be zeros, so the costs of scanning the foreign_table would need to be adjusted to include the eval costs. That's why I propose the assignments in estimate_path_cost_size() shown above. On the other hand, the code bit added by 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch handles the case where a post-scan/join processing step other than grouping/aggregation (ie, the final sort or LIMIT/OFFSET) is performed directly on a base or join relation, as in the LIMIT example. So, the two changes are handling different cases, hence both changes would be required. (I think it might be possible that we can merge the two changes into one by some refactoring of estimate_path_cost_size() or something else, but I haven't considered that hard yet. Should we do that?) Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5C669DFD.30002%40lab.ntt.co.jp
(2019/02/08 21:35), Etsuro Fujita wrote: > (2019/02/08 2:04), Antonin Houska wrote: >> Some comments on coding: >> >> 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch >> ----------------------------------------------------------------- >> >> * I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. >> Note that >> grouping_planner() does not call create_limit_path() until it's done with >> create_ordered_paths(), and create_ordered_paths() is where the FDW is >> requested to add its paths to UPPERREL_ORDERED relation. > > Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT > at all. I added the parameter limit_tuples to PgFdwPathExtraData so that > we can pass limit_tuples=-1, which means no LIMIT, to cost_sort(), though. As proposed by you downthread, I removed the limit_tuples variable at all from that patch. >> * add_foreign_ordered_paths() >> >> Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the >> target. I modified that patch so that we use root->upper_targets[UPPERREL_ORDERED], not root->upper_targets[UPPERREL_FINAL]. >> * regression tests: I think test(s) should be added for queries that have >> ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET) >> clause. I haven't noticed such tests. > > Will do. I noticed that such queries would be processed by what we already have for sort pushdown (ie, add_paths_with_pathkeys_for_rel()). I might be missing something, though. >> 0002-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely-v3.patch >> --------------------------------------------------------------- >> * adjust_limit_rows_costs() >> >> Doesn't this function address more generic problem than the use of >> postgres_fdw? If so, then it should be submitted as a separate patch. >> BTW, the >> function is only used in pathnode.c, so it should be declared static. > > Actually, this is simple refactoring for create_limit_path() so that > that function can be shared with postgres_fdw. See > estimate_path_cost_size(). I'll separate that refactoring in the next > version of the patch set. Done. Other changes: * In add_foreign_ordered_paths() and add_foreign_final_paths(), I replaced create_foreignscan_path() with the new function create_foreign_upper_path() added by the commit [1]. * In 0003-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely.patch, I modified estimate_path_cost_size() so that the costs are adjusted to ensure we'll prefer performing LIMIT remotely, after factoring in some additional cost to account for connection overhead, not before, because that would make the adjustment more robust against changes to such cost. * Revised comments a bit. * Rebased the patchset to the latest HEAD. Attached is an updated version of the patchset. I plan to add more comments in the next version. Best regards, Etsuro Fujita [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=34ea1ab7fd305afe1124a6e73ada0ebae04b6ebb
Attachment
On Wed, Jan 30, 2019 at 6:12 AM Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
(2018/12/28 15:50), Etsuro Fujita wrote:
> Attached is a new version of the
> patch.
Here is an updated version of the patch set. Changes are:
* In the previous version, LIMIT without OFFSET was not performed
remotely as the costs was calculated the same as the costs of performing
it locally. (Actually, such LIMIT was performed remotely in a case in
the postgres_fdw regression test, but that was due to a bug :(.) I
think we should prefer performing such LIMIT remotely as that avoids
extra row fetches from the remote that performing it locally might
cause, so I tweaked the costs estimated in estimate_path_cost_size(), to
ensure that we'll prefer performing such LIMIT remotely.
With your tweaks, I'm still not seeing the ORDER-less LIMIT get pushed down when using use_remote_estimate in a simple test case, either with this set of patches, nor in the V4 set. However, without use_remote_estimate, the LIMIT is now getting pushed with these patches when it does not in HEAD.
See attached test case, to be run in new database named 'foo'.
Cheers,
Jeff
Attachment
Hi Jeff, (2019/02/21 6:19), Jeff Janes wrote: > On Wed, Jan 30, 2019 at 6:12 AM Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote: > > (2018/12/28 15:50), Etsuro Fujita wrote: > > Attached is a new version of the > > patch. > > Here is an updated version of the patch set. Changes are: > > * In the previous version, LIMIT without OFFSET was not performed > remotely as the costs was calculated the same as the costs of > performing > it locally. (Actually, such LIMIT was performed remotely in a case in > the postgres_fdw regression test, but that was due to a bug :(.) I > think we should prefer performing such LIMIT remotely as that avoids > extra row fetches from the remote that performing it locally might > cause, so I tweaked the costs estimated in > estimate_path_cost_size(), to > ensure that we'll prefer performing such LIMIT remotely. > With your tweaks, I'm still not seeing the ORDER-less LIMIT get pushed > down when using use_remote_estimate in a simple test case, either with > this set of patches, nor in the V4 set. However, without > use_remote_estimate, the LIMIT is now getting pushed with these patches > when it does not in HEAD. Good catch! I think the root cause of that is: when using use_remote_estimate, remote estimates obtained through remote EXPLAIN are rounded off to two decimal places, which I completely overlooked. Will fix. I think I can post a new version early next week. > See attached test case, to be run in new database named 'foo'. Thanks for the review and the test case! Best regards, Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Maybe, my explanation in that thread was not enough, but the changes proposed > by the patch posted there wouldn't obsolete the above. Let me explain using a > foreign-table variant of the example shown in the comments for > make_group_input_target(): > > SELECT a+b, sum(c+d) FROM foreign_table GROUP BY a+b; > > When called from postgresGetForeignPaths(), the reltarget for the base > relation foreign_table would be {a, b, c, d}, for the same reason as the LIMIT > example in [1], and the foreign_table's rel_startup_cost and rel_total_cost > would be calculated based on this reltarget in that callback routine. (The > tlist eval costs would be zeroes, though). BUT: before called from > postgresGetForeignUpperPaths() with the UPPERREL_GROUP_AGG stage, the > reltarget would be replaced with {a+b, c, d}, which is the final scan target > as explained in the comments for make_group_input_target(), and the eval costs > of the new reltarget wouldn't be zeros, so the costs of scanning the > foreign_table would need to be adjusted to include the eval costs. That's why > I propose the assignments in estimate_path_cost_size() shown above. Yes, apply_scanjoin_target_to_paths() can add some more costs, including those introduced by make_group_input_target(), which postgres_fdw needs to account for. This is what you fix in https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp > On the other hand, the code bit added by > 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch handles the > case where a post-scan/join processing step other than grouping/aggregation > (ie, the final sort or LIMIT/OFFSET) is performed directly on a base or join > relation, as in the LIMIT example. So, the two changes are handling different > cases, hence both changes would be required. Do you mean this part? + /* + * If this includes an UPPERREL_ORDERED step, the given target, which + * would be the final target to be applied to the resulting path, might + * have different expressions from the underlying relation's reltarget + * (see make_sort_input_target()); adjust tlist eval costs. + */ + if (fpextra && fpextra->target != foreignrel->reltarget) + { + QualCost oldcost = foreignrel->reltarget->cost; + QualCost newcost = fpextra->target->cost; + + startup_cost += newcost.startup - oldcost.startup; + total_cost += newcost.startup - oldcost.startup; + total_cost += (newcost.per_tuple - oldcost.per_tuple) * rows; + } You should not need this. Consider what grouping_planner() does if (!have_grouping && !activeWindows && parse->sortClause != NIL): sort_input_target = make_sort_input_target(...); ... grouping_target = sort_input_target; ... scanjoin_target = grouping_target; ... apply_scanjoin_target_to_paths(...); So apply_scanjoin_target_to_paths() effectively accounts for the additional costs introduced by make_sort_input_target(). Note about the !activeWindows test: It occurs me now that no paths should be added to UPPERREL_ORDERED relation if the query needs WindowAgg node, because postgres_fdw currently cannot handle UPPERREL_WINDOW relation. Or rather in general, the FDW should not skip any stage just because it doesn't know how to build the paths. -- Antonin Houska https://www.cybertec-postgresql.com
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/02/08 21:35), Etsuro Fujita wrote: > > (2019/02/08 2:04), Antonin Houska wrote: > >> * regression tests: I think test(s) should be added for queries that have > >> ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET) > >> clause. I haven't noticed such tests. > > > > Will do. > > I noticed that such queries would be processed by what we already have for > sort pushdown (ie, add_paths_with_pathkeys_for_rel()). I might be missing > something, though. What about an ORDER BY expression that contains multiple Var nodes? For example SELECT * FROM foo ORDER BY x + y; I think the base relation should not be able to generate paths with the corresponding pathkeys, since its target should (besides PlaceHolderVars, which should not appear in the plan of this simple query at all) only emit individual Vars. -- Antonin Houska https://www.cybertec-postgresql.com
(2019/02/23 0:21), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >>> (2019/02/08 2:04), Antonin Houska wrote: >>>> * regression tests: I think test(s) should be added for queries that have >>>> ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET) >>>> clause. I haven't noticed such tests. >> I noticed that such queries would be processed by what we already have for >> sort pushdown (ie, add_paths_with_pathkeys_for_rel()). I might be missing >> something, though. > > What about an ORDER BY expression that contains multiple Var nodes? For > example > > SELECT * FROM foo ORDER BY x + y; > > I think the base relation should not be able to generate paths with the > corresponding pathkeys, since its target should (besides PlaceHolderVars, > which should not appear in the plan of this simple query at all) only emit > individual Vars. Actually, add_paths_with_pathkeys_for_rel() generates such pre-sorted paths for the base relation, as shown in the below example using HEAD without the patchset proposed in this thread: postgres=# explain verbose select a+b from ft1 order by a+b; QUERY PLAN -------------------------------------------------------------------------- Foreign Scan on public.ft1 (cost=100.00..200.32 rows=2560 width=4) Output: (a + b) Remote SQL: SELECT a, b FROM public.t1 ORDER BY (a + b) ASC NULLS LAST (3 rows) I think it is OK for that function to generate such paths, as tlists for such paths would be adjusted in apply_scanjoin_target_to_paths(), by doing create_projection_path() to them. Conversely, it appears that add_foreign_ordered_paths() added by the patchset would generate such pre-sorted paths *redundantly* when the input_rel is the final scan/join relation. Will look into that. Best regards, Etsuro Fujita
(2019/02/22 22:54), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> Maybe, my explanation in that thread was not enough, but the changes proposed >> by the patch posted there wouldn't obsolete the above. Let me explain using a >> foreign-table variant of the example shown in the comments for >> make_group_input_target(): >> >> SELECT a+b, sum(c+d) FROM foreign_table GROUP BY a+b; >> >> When called from postgresGetForeignPaths(), the reltarget for the base >> relation foreign_table would be {a, b, c, d}, for the same reason as the LIMIT >> example in [1], and the foreign_table's rel_startup_cost and rel_total_cost >> would be calculated based on this reltarget in that callback routine. (The >> tlist eval costs would be zeroes, though). BUT: before called from >> postgresGetForeignUpperPaths() with the UPPERREL_GROUP_AGG stage, the >> reltarget would be replaced with {a+b, c, d}, which is the final scan target >> as explained in the comments for make_group_input_target(), and the eval costs >> of the new reltarget wouldn't be zeros, so the costs of scanning the >> foreign_table would need to be adjusted to include the eval costs. That's why >> I propose the assignments in estimate_path_cost_size() shown above. > > Yes, apply_scanjoin_target_to_paths() can add some more costs, including those > introduced by make_group_input_target(), which postgres_fdw needs to account > for. This is what you fix in > > https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp That's right. (I'll post a new version of the patch to address your comments later.) >> On the other hand, the code bit added by >> 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch handles the >> case where a post-scan/join processing step other than grouping/aggregation >> (ie, the final sort or LIMIT/OFFSET) is performed directly on a base or join >> relation, as in the LIMIT example. So, the two changes are handling different >> cases, hence both changes would be required. > > Do you mean this part? No, I don't. What I meant was this part: + /* + * If this is an UPPERREL_ORDERED step performed on the final + * scan/join relation, the costs obtained from the cache wouldn't yet + * contain the eval costs for the final scan/join target, which would + * have been updated by apply_scanjoin_target_to_paths(); add the eval + * costs now. + */ + if (fpextra && !IS_UPPER_REL(foreignrel)) + { + /* The costs should have been obtained from the cache. */ + Assert(fpinfo->rel_startup_cost >= 0 && + fpinfo->rel_total_cost >= 0); + + startup_cost += foreignrel->reltarget->cost.startup; + run_cost += foreignrel->reltarget->cost.per_tuple * rows; + } The case handled by this would be different from one handled by the fix, but as I said in the nearby thread, this part might be completely redundant. Will re-consider about this. > + /* > + * If this includes an UPPERREL_ORDERED step, the given target, which > + * would be the final target to be applied to the resulting path, might > + * have different expressions from the underlying relation's reltarget > + * (see make_sort_input_target()); adjust tlist eval costs. > + */ > + if (fpextra&& fpextra->target != foreignrel->reltarget) > + { > + QualCost oldcost = foreignrel->reltarget->cost; > + QualCost newcost = fpextra->target->cost; > + > + startup_cost += newcost.startup - oldcost.startup; > + total_cost += newcost.startup - oldcost.startup; > + total_cost += (newcost.per_tuple - oldcost.per_tuple) * rows; > + } > > You should not need this. Consider what grouping_planner() does if > (!have_grouping&& !activeWindows&& parse->sortClause != NIL): > > sort_input_target = make_sort_input_target(...); > ... > grouping_target = sort_input_target; > ... > scanjoin_target = grouping_target; > ... > apply_scanjoin_target_to_paths(...); > > So apply_scanjoin_target_to_paths() effectively accounts for the additional > costs introduced by make_sort_input_target(). Yeah, but I think the code bit cited above is needed. Let me explain using yet another example with grouping and ordering: SELECT a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b; For this query, the sort_input_target would be {a+b}, (to which the foreignrel->reltarget would have been set when called from estimate_path_cost_size() called from postgresGetForeignUpperPaths() with the UPPERREL_ORDERED stage), but the final target would be {a+b, random()}, so the sort_input_target isn't the same as the final target in that case; the costs needs to be adjusted so that the costs include the ones of generating the final target. That's the reason why I added the code bit you cited. > Note about the !activeWindows test: It occurs me now that no paths should be > added to UPPERREL_ORDERED relation if the query needs WindowAgg node, because > postgres_fdw currently cannot handle UPPERREL_WINDOW relation. Or rather in > general, the FDW should not skip any stage just because it doesn't know how to > build the paths. That's right. In postgres_fdw, by the following code in postgresGetForeignUpperPaths(), we will give up on doing the UPPERREL_ORDERED step remotely, if the query has the UPPERREL_WINDOW (and/or UPPERREL_DISTINCT) steps, because in that case we would have input_rel->fdw_private=NULL for a call to postgresGetForeignUpperPaths() with the UPPERREL_ORDERED stage: /* * If input rel is not safe to pushdown, then simply return as we cannot * perform any post-join operations on the foreign server. */ if (!input_rel->fdw_private || !((PgFdwRelationInfo *) input_rel->fdw_private)->pushdown_safe) return; Best regards, Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/02/22 22:54), Antonin Houska wrote: > > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: > >> On the other hand, the code bit added by > >> 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch handles the > >> case where a post-scan/join processing step other than grouping/aggregation > >> (ie, the final sort or LIMIT/OFFSET) is performed directly on a base or join > >> relation, as in the LIMIT example. So, the two changes are handling different > >> cases, hence both changes would be required. > > > > Do you mean this part? > > No, I don't. What I meant was this part: > > + /* > + * If this is an UPPERREL_ORDERED step performed on the final > + * scan/join relation, the costs obtained from the cache wouldn't yet > + * contain the eval costs for the final scan/join target, which would > + * have been updated by apply_scanjoin_target_to_paths(); add the eval > + * costs now. > + */ > + if (fpextra && !IS_UPPER_REL(foreignrel)) > + { > + /* The costs should have been obtained from the cache. */ > + Assert(fpinfo->rel_startup_cost >= 0 && > + fpinfo->rel_total_cost >= 0); > + > + startup_cost += foreignrel->reltarget->cost.startup; > + run_cost += foreignrel->reltarget->cost.per_tuple * rows; > + } > Yeah, but I think the code bit cited above is needed. Let me explain using > yet another example with grouping and ordering: > > SELECT a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b; > > For this query, the sort_input_target would be {a+b}, (to which the > foreignrel->reltarget would have been set when called from > estimate_path_cost_size() called from postgresGetForeignUpperPaths() with the > UPPERREL_ORDERED stage), but the final target would be {a+b, random()}, so the > sort_input_target isn't the same as the final target in that case; the costs > needs to be adjusted so that the costs include the ones of generating the > final target. That's the reason why I added the code bit you cited. I used gdb to help me understand, however the condition if (fpextra && !IS_UPPER_REL(foreignrel)) never evaluated to true with the query above. fpextra was only !=NULL when estimate_path_cost_size() was called from add_foreign_ordered_paths(), but at the same time foreignrel->reloptkind was RELOPT_UPPER_REL. It's still unclear to me why add_foreign_ordered_paths() passes the input relation (input_rel) to estimate_path_cost_size(). If it passed the output rel (i.e. ordered_rel in this case) like add_foreign_grouping_paths() does, then foreignrel->reltarget should contain the random() function. (What I see now is that create_ordered_paths() leaves ordered_rel->reltarget empty, but that's another problem to be fixed.) Do you still see a reason to call estimate_path_cost_size() this way? > > Note about the !activeWindows test: It occurs me now that no paths should be > > added to UPPERREL_ORDERED relation if the query needs WindowAgg node, because > > postgres_fdw currently cannot handle UPPERREL_WINDOW relation. Or rather in > > general, the FDW should not skip any stage just because it doesn't know how to > > build the paths. > > That's right. In postgres_fdw, by the following code in > postgresGetForeignUpperPaths(), we will give up on doing the UPPERREL_ORDERED > step remotely, if the query has the UPPERREL_WINDOW (and/or UPPERREL_DISTINCT) > steps, because in that case we would have input_rel->fdw_private=NULL for a > call to postgresGetForeignUpperPaths() with the UPPERREL_ORDERED stage: > > /* > * If input rel is not safe to pushdown, then simply return as we cannot > * perform any post-join operations on the foreign server. > */ > if (!input_rel->fdw_private || > !((PgFdwRelationInfo *) input_rel->fdw_private)->pushdown_safe) > return; I understand now: if FDW did not handle the previous stage, then input_rel->fdw_private is NULL. -- Antonin Houska https://www.cybertec-postgresql.com
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/02/23 0:21), Antonin Houska wrote: > > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: > > >>> (2019/02/08 2:04), Antonin Houska wrote: > >>>> * regression tests: I think test(s) should be added for queries that have > >>>> ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET) > >>>> clause. I haven't noticed such tests. > > >> I noticed that such queries would be processed by what we already have for > >> sort pushdown (ie, add_paths_with_pathkeys_for_rel()). I might be missing > >> something, though. > > > > What about an ORDER BY expression that contains multiple Var nodes? For > > example > > > > SELECT * FROM foo ORDER BY x + y; > > > > I think the base relation should not be able to generate paths with the > > corresponding pathkeys, since its target should (besides PlaceHolderVars, > > which should not appear in the plan of this simple query at all) only emit > > individual Vars. > > Actually, add_paths_with_pathkeys_for_rel() generates such pre-sorted paths > for the base relation, as shown in the below example using HEAD without the > patchset proposed in this thread: > > postgres=# explain verbose select a+b from ft1 order by a+b; > QUERY PLAN > -------------------------------------------------------------------------- > Foreign Scan on public.ft1 (cost=100.00..200.32 rows=2560 width=4) > Output: (a + b) > Remote SQL: SELECT a, b FROM public.t1 ORDER BY (a + b) ASC NULLS LAST > (3 rows) > > I think it is OK for that function to generate such paths, as tlists for such > paths would be adjusted in apply_scanjoin_target_to_paths(), by doing > create_projection_path() to them. I've just checked it. The FDW constructs the (a + b) expression for the ORDER BY clause on its own. It only needs to find the input vars in the relation target. > Conversely, it appears that add_foreign_ordered_paths() added by the patchset > would generate such pre-sorted paths *redundantly* when the input_rel is the > final scan/join relation. Will look into that. Currently I have no idea how to check the plan created by FDW at the UPPERREL_ORDERED stage, except for removing the sort from the UPPERREL_GROUP_AGG stage as I proposed here: https://www.postgresql.org/message-id/11807.1549564431%40localhost -- Antonin Houska https://www.cybertec-postgresql.com
(2019/03/01 20:00), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> (2019/02/22 22:54), Antonin Houska wrote: >>> Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >>>> So, the two changes are handling different >>>> cases, hence both changes would be required. >> + /* >> + * If this is an UPPERREL_ORDERED step performed on the final >> + * scan/join relation, the costs obtained from the cache wouldn't yet >> + * contain the eval costs for the final scan/join target, which would >> + * have been updated by apply_scanjoin_target_to_paths(); add the eval >> + * costs now. >> + */ >> + if (fpextra&& !IS_UPPER_REL(foreignrel)) >> + { >> + /* The costs should have been obtained from the cache. */ >> + Assert(fpinfo->rel_startup_cost>= 0&& >> + fpinfo->rel_total_cost>= 0); >> + >> + startup_cost += foreignrel->reltarget->cost.startup; >> + run_cost += foreignrel->reltarget->cost.per_tuple * rows; >> + } > >> Yeah, but I think the code bit cited above is needed. Let me explain using >> yet another example with grouping and ordering: >> >> SELECT a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b; >> >> For this query, the sort_input_target would be {a+b}, (to which the >> foreignrel->reltarget would have been set when called from >> estimate_path_cost_size() called from postgresGetForeignUpperPaths() with the >> UPPERREL_ORDERED stage), but the final target would be {a+b, random()}, so the >> sort_input_target isn't the same as the final target in that case; the costs >> needs to be adjusted so that the costs include the ones of generating the >> final target. That's the reason why I added the code bit you cited. > > I used gdb to help me understand, however the condition > > if (fpextra&& !IS_UPPER_REL(foreignrel)) > > never evaluated to true with the query above. Sorry, my explanation was not enough again, but I showed that query ("SELECT a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b;") to explain why the following code bit is needed: + /* + * If this includes an UPPERREL_ORDERED step, the given target, which + * would be the final target to be applied to the resulting path, might + * have different expressions from the underlying relation's reltarget + * (see make_sort_input_target()); adjust tlist eval costs. + */ + if (fpextra&& fpextra->target != foreignrel->reltarget) + { + QualCost oldcost = foreignrel->reltarget->cost; + QualCost newcost = fpextra->target->cost; + + startup_cost += newcost.startup - oldcost.startup; + total_cost += newcost.startup - oldcost.startup; + total_cost += (newcost.per_tuple - oldcost.per_tuple) * rows; + } I think that the condition if (fpextra && fpextra->target != foreignrel->reltarget) would be evaluated to true for that query. > It's still unclear to me why add_foreign_ordered_paths() passes the input > relation (input_rel) to estimate_path_cost_size(). If it passed the output rel > (i.e. ordered_rel in this case) like add_foreign_grouping_paths() does, then > foreignrel->reltarget should contain the random() function. (What I see now is > that create_ordered_paths() leaves ordered_rel->reltarget empty, but that's > another problem to be fixed.) > > Do you still see a reason to call estimate_path_cost_size() this way? Yeah, the reason for that is because we can estimate the costs of sorting for the final scan/join relation using the existing code as-is that estimates the costs of sorting for base or join relations, except for tlist eval cost adjustment. As you mentioned, we could pass ordered_rel to estimate_path_cost_size(), but if so, I think we would need to get input_rel (ie, the final scan/join relation) from ordered_rel within estimate_path_cost_size(), to use that code, which would not be great. Best regards, Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/03/01 20:00), Antonin Houska wrote: > > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: > > I used gdb to help me understand, however the condition > > > > if (fpextra&& !IS_UPPER_REL(foreignrel)) > > > > never evaluated to true with the query above. > > Sorry, my explanation was not enough again, but I showed that query ("SELECT > a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b;") to explain why > the following code bit is needed: > > + /* > + * If this includes an UPPERREL_ORDERED step, the given target, which > + * would be the final target to be applied to the resulting path, > might > + * have different expressions from the underlying relation's reltarget > + * (see make_sort_input_target()); adjust tlist eval costs. > + */ > + if (fpextra&& fpextra->target != foreignrel->reltarget) > + { > + QualCost oldcost = foreignrel->reltarget->cost; > + QualCost newcost = fpextra->target->cost; > + > + startup_cost += newcost.startup - oldcost.startup; > + total_cost += newcost.startup - oldcost.startup; > + total_cost += (newcost.per_tuple - oldcost.per_tuple) * rows; > + } Maybe I undestand now. Do the expressions (newcost.* - oldcost.*) reflect the fact that, for the query SELECT a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b; the UPPERREL_ORDERED stage only needs to evaluate the random() function because (a + b) was already evaluated during the UPPERREL_GROUP_AGG stage? -- Antonin Houska https://www.cybertec-postgresql.com
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/03/01 20:00), Antonin Houska wrote: > > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: > > It's still unclear to me why add_foreign_ordered_paths() passes the input > > relation (input_rel) to estimate_path_cost_size(). If it passed the output rel > > (i.e. ordered_rel in this case) like add_foreign_grouping_paths() does, then > > foreignrel->reltarget should contain the random() function. (What I see now is > > that create_ordered_paths() leaves ordered_rel->reltarget empty, but that's > > another problem to be fixed.) > > > > Do you still see a reason to call estimate_path_cost_size() this way? > > Yeah, the reason for that is because we can estimate the costs of sorting for > the final scan/join relation using the existing code as-is that estimates the > costs of sorting for base or join relations, except for tlist eval cost > adjustment. As you mentioned, we could pass ordered_rel to > estimate_path_cost_size(), but if so, I think we would need to get input_rel > (ie, the final scan/join relation) from ordered_rel within > estimate_path_cost_size(), to use that code, which would not be great. After some more thought I suggest that estimate_path_cost_size() is redesigned before your patch gets applied. The function is quite hard to read if - with your patch applied - the "foreignrel" argument sometimes means the output relation and other times the input one. I takes some effort to "switch context" between these two perspectives when I read the code. Perhaps both input and output relation should be passed to the function now, and maybe also UpperRelationKind of the output relation. And maybe it's even worth moving some code into a subroutine that handles only the upper relation. Originally the function could only handle base relations, then join relation was added, then one kind of upper relation (UPPERREL_GROUP_AGG) was added and eventually the function will need to handle multiple kinds of upper relation. It should be no surprise if such an amount of changes makes the original signature insufficient. -- Antonin Houska https://www.cybertec-postgresql.com
(2019/03/02 1:14), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> (2019/03/01 20:00), Antonin Houska wrote: >> Sorry, my explanation was not enough again, but I showed that query ("SELECT >> a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b;") to explain why >> the following code bit is needed: >> >> + /* >> + * If this includes an UPPERREL_ORDERED step, the given target, which >> + * would be the final target to be applied to the resulting path, >> might >> + * have different expressions from the underlying relation's reltarget >> + * (see make_sort_input_target()); adjust tlist eval costs. >> + */ >> + if (fpextra&& fpextra->target != foreignrel->reltarget) >> + { >> + QualCost oldcost = foreignrel->reltarget->cost; >> + QualCost newcost = fpextra->target->cost; >> + >> + startup_cost += newcost.startup - oldcost.startup; >> + total_cost += newcost.startup - oldcost.startup; >> + total_cost += (newcost.per_tuple - oldcost.per_tuple) * rows; >> + } > > Maybe I undestand now. Do the expressions (newcost.* - oldcost.*) reflect the > fact that, for the query > > SELECT a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b; > > the UPPERREL_ORDERED stage only needs to evaluate the random() function > because (a + b) was already evaluated during the UPPERREL_GROUP_AGG stage? Yeah, I think so. Best regards, Etsuro Fujita
(2019/03/04 16:46), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> (2019/03/01 20:00), Antonin Houska wrote: >>> It's still unclear to me why add_foreign_ordered_paths() passes the input >>> relation (input_rel) to estimate_path_cost_size(). If it passed the output rel >>> (i.e. ordered_rel in this case) like add_foreign_grouping_paths() does, then >>> foreignrel->reltarget should contain the random() function. (What I see now is >>> that create_ordered_paths() leaves ordered_rel->reltarget empty, but that's >>> another problem to be fixed.) >>> >>> Do you still see a reason to call estimate_path_cost_size() this way? >> >> Yeah, the reason for that is because we can estimate the costs of sorting for >> the final scan/join relation using the existing code as-is that estimates the >> costs of sorting for base or join relations, except for tlist eval cost >> adjustment. As you mentioned, we could pass ordered_rel to >> estimate_path_cost_size(), but if so, I think we would need to get input_rel >> (ie, the final scan/join relation) from ordered_rel within >> estimate_path_cost_size(), to use that code, which would not be great. > > After some more thought I suggest that estimate_path_cost_size() is redesigned > before your patch gets applied. The function is quite hard to read if - with > your patch applied - the "foreignrel" argument sometimes means the output > relation and other times the input one. I takes some effort to "switch > context" between these two perspectives when I read the code. I have to admit that my proposal makes estimate_path_cost_size() complicated to a certain extent, but I don't think it changes the meaning of the foreignrel; even in my proposal, the foreignrel should be considered as an output relation rather than an input relation, because we create a foreign upper path with the foreignrel being the parent RelOptInfo for the foreign upper path in add_foreign_ordered_paths() or add_foreign_final_paths(). > Perhaps both input and output relation should be passed to the function now, > and maybe also UpperRelationKind of the output relation. My concern is: that would make it inconvenient to call that function. > And maybe it's even > worth moving some code into a subroutine that handles only the upper relation. > > Originally the function could only handle base relations, then join relation > was added, then one kind of upper relation (UPPERREL_GROUP_AGG) was added and > eventually the function will need to handle multiple kinds of upper > relation. It should be no surprise if such an amount of changes makes the > original signature insufficient. I 100% agree with you on that point. Best regards, Etsuro Fujita
(2019/03/01 20:16), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> Conversely, it appears that add_foreign_ordered_paths() added by the patchset >> would generate such pre-sorted paths *redundantly* when the input_rel is the >> final scan/join relation. Will look into that. > > Currently I have no idea how to check the plan created by FDW at the > UPPERREL_ORDERED stage, except for removing the sort from the > UPPERREL_GROUP_AGG stage as I proposed here: > > https://www.postgresql.org/message-id/11807.1549564431%40localhost I don't understand your words "how to check the plan created by FDW". Could you elaborate on that a bit more? Best regards, Etsuro Fujita
(2019/02/25 18:40), Etsuro Fujita wrote: > (2019/02/23 0:21), Antonin Houska wrote: >> Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> What about an ORDER BY expression that contains multiple Var nodes? For >> example >> >> SELECT * FROM foo ORDER BY x + y; > Actually, add_paths_with_pathkeys_for_rel() generates such pre-sorted > paths for the base relation, as shown in the below example using HEAD > without the patchset proposed in this thread: > > postgres=# explain verbose select a+b from ft1 order by a+b; > QUERY PLAN > -------------------------------------------------------------------------- > Foreign Scan on public.ft1 (cost=100.00..200.32 rows=2560 width=4) > Output: (a + b) > Remote SQL: SELECT a, b FROM public.t1 ORDER BY (a + b) ASC NULLS LAST > (3 rows) > > I think it is OK for that function to generate such paths, as tlists for > such paths would be adjusted in apply_scanjoin_target_to_paths(), by > doing create_projection_path() to them. > > Conversely, it appears that add_foreign_ordered_paths() added by the > patchset would generate such pre-sorted paths *redundantly* when the > input_rel is the final scan/join relation. Will look into that. As mentioned above, I noticed that we generated a properly-sorted path redundantly in add_foreign_ordered_paths(), for the case where 1) the input_rel is the final scan/join relation and 2) the input_rel already has a properly-sorted path in its pathlist that was created by add_paths_with_pathkeys_for_rel(). So, I modified add_foreign_ordered_paths() to skip creating a path in that case. (2019/02/25 19:31), Etsuro Fujita wrote: > + /* > + * If this is an UPPERREL_ORDERED step performed on the final > + * scan/join relation, the costs obtained from the cache wouldn't yet > + * contain the eval costs for the final scan/join target, which would > + * have been updated by apply_scanjoin_target_to_paths(); add the eval > + * costs now. > + */ > + if (fpextra && !IS_UPPER_REL(foreignrel)) > + { > + /* The costs should have been obtained from the cache. */ > + Assert(fpinfo->rel_startup_cost >= 0 && > + fpinfo->rel_total_cost >= 0); > + > + startup_cost += foreignrel->reltarget->cost.startup; > + run_cost += foreignrel->reltarget->cost.per_tuple * rows; > + } > but as I said in the nearby thread, this part might be completely > redundant. Will re-consider about this. I thought that if it was true that in add_foreign_ordered_paths() we didn't need to consider pushing down the final sort to the remote in the case where the input_rel to that function is the final scan/join relation, the above code could be entirely removed from estimate_path_cost_size(), but I noticed that that is wrong; as we do not always have a properly-sorted path in the input_rel's pathlist already. So, I think we need to keep the above code so that we we can consider the final sort pushdown for the final scan/join relation in add_foreign_ordered_paths(). Sorry for the confusion. I moved the above code to the place we get cached costs, which I hope makes estimate_path_cost_size() a bit more readable. Other changes: * I modified add_foreign_final_paths() to skip creating a path if possible, in a similar way to add_foreign_ordered_paths(). * I fixed the issue pointed out by Jeff [1]. * I added more comments. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CAMkU%3D1z1Ez7fb_P_0Bc1040npE5fCOnu0M1DFyOzCp%3De%3DrBJCw%40mail.gmail.com
Attachment
(2019/02/22 17:17), Etsuro Fujita wrote: > (2019/02/21 6:19), Jeff Janes wrote: >> With your tweaks, I'm still not seeing the ORDER-less LIMIT get pushed >> down when using use_remote_estimate in a simple test case, either with >> this set of patches, nor in the V4 set. However, without >> use_remote_estimate, the LIMIT is now getting pushed with these patches >> when it does not in HEAD. > > Good catch! I think the root cause of that is: when using > use_remote_estimate, remote estimates obtained through remote EXPLAIN > are rounded off to two decimal places, which I completely overlooked. > Will fix. I think I can post a new version early next week. Done. Please see [1]. Sorry for the delay. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5C7FC458.4020400%40lab.ntt.co.jp
(2019/03/06 22:00), Etsuro Fujita wrote: > (2019/02/25 18:40), Etsuro Fujita wrote: >> (2019/02/23 0:21), Antonin Houska wrote: >>> Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: > >>> What about an ORDER BY expression that contains multiple Var nodes? For >>> example >>> >>> SELECT * FROM foo ORDER BY x + y; > >> Actually, add_paths_with_pathkeys_for_rel() generates such pre-sorted >> paths for the base relation, as shown in the below example using HEAD >> without the patchset proposed in this thread: >> >> postgres=# explain verbose select a+b from ft1 order by a+b; >> QUERY PLAN >> -------------------------------------------------------------------------- >> >> Foreign Scan on public.ft1 (cost=100.00..200.32 rows=2560 width=4) >> Output: (a + b) >> Remote SQL: SELECT a, b FROM public.t1 ORDER BY (a + b) ASC NULLS LAST >> (3 rows) >> >> I think it is OK for that function to generate such paths, as tlists for >> such paths would be adjusted in apply_scanjoin_target_to_paths(), by >> doing create_projection_path() to them. >> >> Conversely, it appears that add_foreign_ordered_paths() added by the >> patchset would generate such pre-sorted paths *redundantly* when the >> input_rel is the final scan/join relation. Will look into that. > > As mentioned above, I noticed that we generated a properly-sorted path > redundantly in add_foreign_ordered_paths(), for the case where 1) the > input_rel is the final scan/join relation and 2) the input_rel already > has a properly-sorted path in its pathlist that was created by > add_paths_with_pathkeys_for_rel(). So, I modified > add_foreign_ordered_paths() to skip creating a path in that case. > > (2019/02/25 19:31), Etsuro Fujita wrote: > > + /* > > + * If this is an UPPERREL_ORDERED step performed on the final > > + * scan/join relation, the costs obtained from the cache wouldn't yet > > + * contain the eval costs for the final scan/join target, which would > > + * have been updated by apply_scanjoin_target_to_paths(); add the eval > > + * costs now. > > + */ > > + if (fpextra && !IS_UPPER_REL(foreignrel)) > > + { > > + /* The costs should have been obtained from the cache. */ > > + Assert(fpinfo->rel_startup_cost >= 0 && > > + fpinfo->rel_total_cost >= 0); > > + > > + startup_cost += foreignrel->reltarget->cost.startup; > > + run_cost += foreignrel->reltarget->cost.per_tuple * rows; > > + } > > > but as I said in the nearby thread, this part might be completely > > redundant. Will re-consider about this. > > I thought that if it was true that in add_foreign_ordered_paths() we > didn't need to consider pushing down the final sort to the remote in the > case where the input_rel to that function is the final scan/join > relation, the above code could be entirely removed from > estimate_path_cost_size(), but I noticed that that is wrong; as we do > not always have a properly-sorted path in the input_rel's pathlist > already. So, I think we need to keep the above code so that we we can > consider the final sort pushdown for the final scan/join relation in > add_foreign_ordered_paths(). Sorry for the confusion. I moved the above > code to the place we get cached costs, which I hope makes > estimate_path_cost_size() a bit more readable. > > Other changes: > > * I modified add_foreign_final_paths() to skip creating a path if > possible, in a similar way to add_foreign_ordered_paths(). > * I fixed the issue pointed out by Jeff [1]. > * I added more comments. One thing I forgot to mention is a bug fix: when costing an *ordered* foreign-grouping path using local statistics in estimate_path_cost_size(), we get the rows estimate from the foreignrel->rows, but in the previous version, the foreignrel->rows for the grouping relation was not updated accordingly, so the rows estimate was set to 0. I fixed this. And I noticed that I sent in a WIP version of the patch set :(. Sorry for that. That version wasn't modified well enough, especially to add the fast-path mentioned above. Please find attached an updated version (v6) of the patch set. Best regards, Etsuro Fujita
Attachment
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2019/03/01 20:16), Antonin Houska wrote: > > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: > > >> Conversely, it appears that add_foreign_ordered_paths() added by the patchset > >> would generate such pre-sorted paths *redundantly* when the input_rel is the > >> final scan/join relation. Will look into that. > > > > Currently I have no idea how to check the plan created by FDW at the > > UPPERREL_ORDERED stage, except for removing the sort from the > > UPPERREL_GROUP_AGG stage as I proposed here: > > > > https://www.postgresql.org/message-id/11807.1549564431%40localhost > > I don't understand your words "how to check the plan created by FDW". Could > you elaborate on that a bit more? I meant that I don't know how to verify that the plan that sends the ORDER BY clause to the remote server was created at the UPPERREL_ORDERED and not at UPPERREL_GROUP_AGG. However if the ORDER BY clause really should not be added at the UPPERREL_GROUP_AGG stage and if we ensure that it no longer happens, then mere presence of ORDER BY in the (remote) plan means that the UPPERREL_ORDERED stage works fine. -- Antonin Houska https://www.cybertec-postgresql.com
(2019/03/09 1:25), Antonin Houska wrote: > Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> (2019/03/01 20:16), Antonin Houska wrote: >>> Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >>>> Conversely, it appears that add_foreign_ordered_paths() added by the patchset >>>> would generate such pre-sorted paths *redundantly* when the input_rel is the >>>> final scan/join relation. Will look into that. >>> >>> Currently I have no idea how to check the plan created by FDW at the >>> UPPERREL_ORDERED stage, except for removing the sort from the >>> UPPERREL_GROUP_AGG stage as I proposed here: >>> >>> https://www.postgresql.org/message-id/11807.1549564431%40localhost >> >> I don't understand your words "how to check the plan created by FDW". Could >> you elaborate on that a bit more? > > I meant that I don't know how to verify that the plan that sends the ORDER BY > clause to the remote server was created at the UPPERREL_ORDERED and not at > UPPERREL_GROUP_AGG. However if the ORDER BY clause really should not be added > at the UPPERREL_GROUP_AGG stage and if we ensure that it no longer happens, > then mere presence of ORDER BY in the (remote) plan means that the > UPPERREL_ORDERED stage works fine. I don't think we need to consider pushing down the query's ORDER BY to the remote in GetForeignUpperPaths() for each of upper relations below UPPERREL_ORDERED (ie, UPPERREL_GROUP_AGG, UPPERREL_WINDOW, and UPPERREL_DISTINCT); I think that's a job for GetForeignUpperPaths() for UPPERREL_ORDERED, though the division of labor would be arbitrary. However, I think it's a good thing that there is a room for considering remote sorts even in GetForeignUpperPaths() for UPPERREL_GROUP_AGG, because some remote sorts might be useful to process its upper relation. Consider this using postgres_fdw: postgres=# explain verbose select distinct count(a) from ft1 group by b; QUERY PLAN ------------------------------------------------------------------------ Unique (cost=121.47..121.52 rows=10 width=12) Output: (count(a)), b -> Sort (cost=121.47..121.49 rows=10 width=12) Output: (count(a)), b Sort Key: (count(ft1.a)) -> Foreign Scan (cost=105.00..121.30 rows=10 width=12) Output: (count(a)), b Relations: Aggregate on (public.ft1) Remote SQL: SELECT count(a), b FROM public.t1 GROUP BY 2 (9 rows) For this query it might be useful to push down the sort on top of the foreign scan. To allow that, we should allow GetForeignUpperPaths() for UPPERREL_GROUP_AGG to consider that sort pushdown. (We would soon implement the SELECT DISTINCT pushdown for postgres_fdw, and if so, we would no longer need to consider that sort pushdown in postgresGetForeignUpperPaths() for UPPERREL_GROUP_AGG, but I don't think all FDWs can have the ability to push down SELECT DISTINCT to the remote...) Best regards, Etsuro Fujita
(2019/03/07 19:27), Etsuro Fujita wrote: > (2019/03/06 22:00), Etsuro Fujita wrote: >> (2019/02/25 18:40), Etsuro Fujita wrote: >>> (2019/02/23 0:21), Antonin Houska wrote: >>>> Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote: >> >>>> What about an ORDER BY expression that contains multiple Var nodes? For >>>> example >>>> >>>> SELECT * FROM foo ORDER BY x + y; >> >>> Actually, add_paths_with_pathkeys_for_rel() generates such pre-sorted >>> paths for the base relation, as shown in the below example using HEAD >>> without the patchset proposed in this thread: >>> >>> postgres=# explain verbose select a+b from ft1 order by a+b; >>> QUERY PLAN >>> -------------------------------------------------------------------------- >>> >>> >>> Foreign Scan on public.ft1 (cost=100.00..200.32 rows=2560 width=4) >>> Output: (a + b) >>> Remote SQL: SELECT a, b FROM public.t1 ORDER BY (a + b) ASC NULLS LAST >>> (3 rows) >>> >>> I think it is OK for that function to generate such paths, as tlists for >>> such paths would be adjusted in apply_scanjoin_target_to_paths(), by >>> doing create_projection_path() to them. >>> >>> Conversely, it appears that add_foreign_ordered_paths() added by the >>> patchset would generate such pre-sorted paths *redundantly* when the >>> input_rel is the final scan/join relation. Will look into that. >> >> As mentioned above, I noticed that we generated a properly-sorted path >> redundantly in add_foreign_ordered_paths(), for the case where 1) the >> input_rel is the final scan/join relation and 2) the input_rel already >> has a properly-sorted path in its pathlist that was created by >> add_paths_with_pathkeys_for_rel(). So, I modified >> add_foreign_ordered_paths() to skip creating a path in that case. I had a rethink about this and noticed that the previous version was not enough; since query_pathkeys is set to root->sort_pathkeys in that case (ie, the case where the input_rel is a base or join relation), we would already have considered pushing down the final sort when creating pre-sorted foreign paths for the input_rel in postgresGetForeignPaths() or postgresGetForeignJoinPaths(), so *no work* in that case. I modified the patch as such. >> (2019/02/25 19:31), Etsuro Fujita wrote: >> > + /* >> > + * If this is an UPPERREL_ORDERED step performed on the final >> > + * scan/join relation, the costs obtained from the cache wouldn't yet >> > + * contain the eval costs for the final scan/join target, which would >> > + * have been updated by apply_scanjoin_target_to_paths(); add the eval >> > + * costs now. >> > + */ >> > + if (fpextra && !IS_UPPER_REL(foreignrel)) >> > + { >> > + /* The costs should have been obtained from the cache. */ >> > + Assert(fpinfo->rel_startup_cost >= 0 && >> > + fpinfo->rel_total_cost >= 0); >> > + >> > + startup_cost += foreignrel->reltarget->cost.startup; >> > + run_cost += foreignrel->reltarget->cost.per_tuple * rows; >> > + } >> >> > but as I said in the nearby thread, this part might be completely >> > redundant. Will re-consider about this. >> >> I thought that if it was true that in add_foreign_ordered_paths() we >> didn't need to consider pushing down the final sort to the remote in the >> case where the input_rel to that function is the final scan/join >> relation, the above code could be entirely removed from >> estimate_path_cost_size(), but I noticed that that is wrong; I was wrong here; we don't need to consider pushing down the final sort in that case, as mentioned above, so we could remove that code. Sorry for the confusion. >> I moved the above >> code to the place we get cached costs, which I hope makes >> estimate_path_cost_size() a bit more readable. I moved that code from the UPPERREL_ORDERED patch to the UPPERREL_FINAL patch, because we still need that code for estimating the costs of a foreign path created in add_foreign_final_paths() defined in the latter patch. I polished the patches; revised code, added some more regression tests, and tweaked comments further. Attached is an updated version of the patch set. Best regards, Etsuro Fujita
Attachment
(2019/03/20 20:47), Etsuro Fujita wrote: > Attached is an updated version of the patch set. One thing I noticed while self-reviewing the patch for UPPERREL_FINAL is: the previous versions of the patch don't show EPQ plans in EXPLAIN, as shown in the below example, so we can't check if those plans are constructed correctly, which I'll explain below: * HEAD EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* -> LockRows Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* -> Foreign Scan Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 --> -> Result --> Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* -> Sort Output: t1.c1, t1.c3, t1.*, t2.c1, t2.* Sort Key: t1.c3, t1.c1 -> Merge Join Output: t1.c1, t1.c3, t1.*, t2.c1, t2.* Merge Cond: (t1.c1 = t2.c1) -> Sort Output: t1.c1, t1.c3, t1.* Sort Key: t1.c1 -> Foreign Scan on public.ft1 t1 Output: t1.c1, t1.c3, t1.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE -> Sort Output: t2.c1, t2.* Sort Key: t2.c1 -> Foreign Scan on public.ft2 t2 Output: t2.c1, t2.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" (28 rows) * Patched EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Foreign Scan Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint FOR UPDATE OF r1 (4 rows) In HEAD, this test case checks that a Result node is inserted atop of an EPQ plan for a foreign join (if necessary) when the foreign join is at the topmost join level (see discussion [1]), but the patched version doesn't show EPQ plans in EXPLAIN, so we can't check that as-is. Should we show EPQ plans in EXPLAIN? My inclination would be to not show them, because that information would be completely useless for the user. Another thing I noticed is: the previous versions make pointless another test case added by commit 4bbf6edfb (and adjusted by commit 99f6a17dd) that checks an EPQ plan constructed from multiple merge joins, because they changed a query plan for that test case like the above. (Actually, though, that test case doesn't need that EPQ plan already since it doesn't involve regular tables and it never fires EPQ rechecks.) To avoid that, I modified that test case accordingly. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/8946.1544644803%40sss.pgh.pa.us
(2019/03/27 17:15), Etsuro Fujita wrote: > I'll > check the patchset once more and add the commit messages in the next > version. I couldn't find any issue. I added the commit messages. Attached is a new version. If there are no objections, I'll commit this version. Best regards, Etsuro Fujita
Attachment
(2019/03/29 22:18), Etsuro Fujita wrote: > Attached is a > new version. If there are no objections, I'll commit this version. Pushed after polishing the patchset a bit further: add an assertion, tweak comments, and do cleanups. Thanks for reviewing, Antonin and Jeff! Best regards, Etsuro Fujita