Home > mailing lists

Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From	Ronan Dunklau
Subject	Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date	November 4, 2021 07:59:00
Msg-id	3163474.aeNJFYEL58@aivenronan Whole thread
In response to	Re: Add proper planner support for ORDER BY / DISTINCT aggregates (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses	Re: Add proper planner support for ORDER BY / DISTINCT aggregates Re: Add proper planner support for ORDER BY / DISTINCT aggregates Re: Add proper planner support for ORDER BY / DISTINCT aggregates
List	pgsql-hackers

Tree view

Le jeudi 22 juillet 2021, 10:42:49 CET Ronan Dunklau a écrit :
> Le jeudi 22 juillet 2021, 09:38:50 CEST David Rowley a écrit :
> > On Thu, 22 Jul 2021 at 02:01, Ronan Dunklau <ronan.dunklau@aiven.io>
wrote:
> > > I tested the 0001 patch against both HEAD and my proposed bugfix for
> > > postgres_fdw.
> > >
> > > There is a problem that the ordered aggregate is not pushed down
> > > anymore.
> > > The underlying Sort node is correctly pushed down though.
> > >
> > > This comes from the fact that postgres_fdw grouping path never contains
> > > any
> > > pathkey. Since the cost is fuzzily the same between the pushed-down
> > > aggregate and the locally performed one, the tie is broken against the
> > > pathkeys.
> >
> > I think this might be more down to a lack of any penalty cost for
> > fetching foreign tuples. Looking at create_foreignscan_path(), I don't
> > see anything that adds anything to account for fetching the tuples
> > from the foreign server. If there was something like that then there'd
> > be more of a preference to perform the remote aggregation so that
> > fewer rows must arrive from the remote server.
> >
> > I tested by adding: total_cost += cpu_tuple_cost * rows * 100; in
> > create_foreignscan_path() and I got the plan with the remote
> > aggregation. That's a fairly large penalty of 1.0 per row. Much bigger
> > than parallel_tuple_cost's default value.
> >
> > I'm a bit undecided on how much this patch needs to get involved in
> > adjusting foreign scan costs.  The problem is that we've given the
> > executor a new path to consider and nobody has done any proper
> > costings for the foreign scan so that it properly prefers paths that
> > have to pull fewer foreign tuples.  This is a pretty similar problem
> > to what parallel_tuple_cost aims to fix. Also similar to how we had to
> > add APPEND_CPU_COST_MULTIPLIER to have partition-wise aggregates
> > prefer grouping at the partition level rather than at the partitioned
> > table level.
> >
> > > Ideally we would add the group pathkeys to the grouping path, but this
> > > would add an additional ORDER BY expression matching the GROUP BY.
> > > Moreover, some triaging of the pathkeys would be necessary since we now
> > > mix the sort-in- aggref pathkeys with the group_pathkeys.
> >
> > I think you're talking about passing pathkeys into
> > create_foreign_upper_path in add_foreign_grouping_paths. If so, I
> > don't really see how it would be safe to add pathkeys to the foreign
> > grouping path. What if the foreign server did a Hash Aggregate?  The
> > rows might come back in any random order.
>
> Yes, I was suggesting to add a new path with the pathkeys factored in, which
> if chosen over the non-ordered path would result in an additional ORDER BY
> clause to prevent a HashAggregate. But that doesn't seem a good idea after
> all.
>
> > I kinda think that to fix this properly would need a new foreign
> > server option such as foreign_tuple_cost. I'd feel better about
> > something like that with some of the people with a vested interest in
> > the FDW code were watching more closely. So far we've not managed to
> > entice any of them with the bug report yet, but it's maybe early days
> > yet.
>
> We already have that in the form of fdw_tuple_cost as a server option if I'm
> not mistaken ? It works as expected when the number of tuples is notably
> reduced by the foreign group by.
>
> The problem arise when the cardinality of the groups is equal to the input's
> cardinality. I think even in that case we should try to use a remote
> aggregate since it's a computation that will not happen on the local
> server. I also think we're more likely to have up to date statistics
> remotely than the ones collected locally on the foreign tables, and the
> estimated number of groups would be more accurate on the remote side than
> the local one.

I took some time to toy with this again.

At first I thought that charging a discount in foreign grouping paths for
Aggref targets (since they are computed remotely) would be a good idea,
similar to what is done for the grouping keys.

But in the end, it's probably not something we would like to do. Yes, the
group planning will be more accurate on the remote side generally (better
statistics than locally for estimating the number of groups) but executing the
grouping locally when the number of groups is close to the input's cardinality
(ex: group by unique_key)  gives us a form of parallelism which can actually
perform better.

For the other cases where there is fewer output than input tuples, that is,
when an actual grouping takes place, adjusting fdw_tuple_cost might be enough
to tune the behavior to what is desirable.


--
Ronan Dunklau

pgsql-hackers by date:

From: Michael Paquier
Date: 04 November 2021, 07:41:04
Subject: Re: should we enable log_checkpoints out of the box?

From: Amit Kapila
Date: 04 November 2021, 08:10:48
Subject: Re: Data is copied twice when specifying both child and parent table in publication

Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

Previous

Next