Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Add proper planner support for ORDER BY / DISTINCT aggregates |
Date | |
Msg-id | CAApHDvrXnUFNiYQiHv4Eu_rtbMPsKeVT-KG93_dD-ZYs9G5OnA@mail.gmail.com Whole thread Raw |
In response to | Re: Add proper planner support for ORDER BY / DISTINCT aggregates (Ronan Dunklau <ronan.dunklau@aiven.io>) |
Responses |
Re: Add proper planner support for ORDER BY / DISTINCT aggregates
|
List | pgsql-hackers |
On Thu, 22 Jul 2021 at 02:01, Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > I tested the 0001 patch against both HEAD and my proposed bugfix for > postgres_fdw. > > There is a problem that the ordered aggregate is not pushed down anymore. The > underlying Sort node is correctly pushed down though. > > This comes from the fact that postgres_fdw grouping path never contains any > pathkey. Since the cost is fuzzily the same between the pushed-down aggregate > and the locally performed one, the tie is broken against the pathkeys. I think this might be more down to a lack of any penalty cost for fetching foreign tuples. Looking at create_foreignscan_path(), I don't see anything that adds anything to account for fetching the tuples from the foreign server. If there was something like that then there'd be more of a preference to perform the remote aggregation so that fewer rows must arrive from the remote server. I tested by adding: total_cost += cpu_tuple_cost * rows * 100; in create_foreignscan_path() and I got the plan with the remote aggregation. That's a fairly large penalty of 1.0 per row. Much bigger than parallel_tuple_cost's default value. I'm a bit undecided on how much this patch needs to get involved in adjusting foreign scan costs. The problem is that we've given the executor a new path to consider and nobody has done any proper costings for the foreign scan so that it properly prefers paths that have to pull fewer foreign tuples. This is a pretty similar problem to what parallel_tuple_cost aims to fix. Also similar to how we had to add APPEND_CPU_COST_MULTIPLIER to have partition-wise aggregates prefer grouping at the partition level rather than at the partitioned table level. > Ideally we would add the group pathkeys to the grouping path, but this would > add an additional ORDER BY expression matching the GROUP BY. Moreover, some > triaging of the pathkeys would be necessary since we now mix the sort-in- > aggref pathkeys with the group_pathkeys. I think you're talking about passing pathkeys into create_foreign_upper_path in add_foreign_grouping_paths. If so, I don't really see how it would be safe to add pathkeys to the foreign grouping path. What if the foreign server did a Hash Aggregate? The rows might come back in any random order. I kinda think that to fix this properly would need a new foreign server option such as foreign_tuple_cost. I'd feel better about something like that with some of the people with a vested interest in the FDW code were watching more closely. So far we've not managed to entice any of them with the bug report yet, but it's maybe early days yet. David
pgsql-hackers by date: