Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date
Msg-id CAApHDvpXiXLxg4TsA8P_4etnuGQqAAbHWEOM4hGe=DCaXmi_jA@mail.gmail.com
Whole thread Raw
In response to Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (Ronan Dunklau <ronan.dunklau@aiven.io>)
List pgsql-hackers
On Thu, 4 Nov 2021 at 20:59, Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> I took some time to toy with this again.
>
> At first I thought that charging a discount in foreign grouping paths for
> Aggref targets (since they are computed remotely) would be a good idea,
> similar to what is done for the grouping keys.
>
> But in the end, it's probably not something we would like to do. Yes, the
> group planning will be more accurate on the remote side generally (better
> statistics than locally for estimating the number of groups) but executing the
> grouping locally when the number of groups is close to the input's cardinality
> (ex: group by unique_key)  gives us a form of parallelism which can actually
> perform better.
>
> For the other cases where there is fewer output than input tuples, that is,
> when an actual grouping takes place, adjusting fdw_tuple_cost might be enough
> to tune the behavior to what is desirable.

I've now looked into this issue. With the patched code, the remote
aggregate path loses out in add_path() due to the fact that the local
aggregate path compares fuzzily the same as the remote aggregate path.
Since the local aggregate path is now fetching the rows from the
foreign server with a SQL query containing an ORDER BY clause (per my
change to query_pathkeys being picked up in
get_useful_pathkeys_for_relation()), add_path now prefers that path
due to it having pathkeys and the remote aggregate query not having
any (PATHKEYS_BETTER2).

It seems what's going on is that quite simply the default
fdw_tuple_cost is unrealistically low. Let's look.

#define DEFAULT_FDW_TUPLE_COST 0.01

Which is even lower than DEFAULT_PARALLEL_TUPLE_COST (0.1) and the
same as cpu_tuple_cost!

After some debugging, I see add_path() switches to the, seemingly
better, remote aggregate plan again if I multiple fdw_tuple_cost by
28. Anything below that sticks to the (inferior) local aggregate plan.

There's also another problem going on that would make that situation
better. The query planner expects the following query to produce 6
rows:

SELECT array_agg("C 1" ORDER BY "C 1" USING OPERATOR(public.<^) NULLS
LAST), c2 FROM "S 1"."T 1" WHERE (("C 1" < 100)) AND ((c2 = 6)) GROUP
BY c2;

You might expect the planner to think there'd just be 1 row due to the
"c2 = 6" and "GROUP BY c2", but it thinks there will be more than
that. If estimate_num_groups() knew about EquivalenceClasses and
checked ec_has_const, then it might be able to do better, but it
doesn't, so:

GroupAggregate  (cost=11.67..11.82 rows=6 width=36)

If I force that estimate to be 1 row instead of 6, then I only need a
fdw_tuple_cost to be 12 times the default to get it to switch to the
remote aggregate plan.

I think we should likely just patch master and change
DEFAULT_FDW_TUPLE_COST to at the very least 0.2, which is 20x higher
than today's setting. I'd be open to a much higher setting such as 0.5
(50x).

David



pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Hash index build performance tweak from sorting
Next
From: David Rowley
Date:
Subject: Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size