Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: POC: GROUP BY optimization
Date
Msg-id 50c1c1dc-057c-421b-a1f4-62622c7afe61@postgrespro.ru
Whole thread Raw
In response to Re: POC: GROUP BY optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 4/12/24 06:44, Tom Lane wrote:
> * It seems like root->processed_groupClause no longer has much to do
> with the grouping behavior, which is scary given how much code still
> believes that it does.  I suspect there are bugs lurking there, and
1. Analysing the code, processed_groupClause is used in:
- adjust_foreign_grouping_path_cost - to decide, do we need to add cost 
of sort to the foreign grouping.
- just for replacing relids during SJE process.
- create_groupingsets_plan(), preprocess_grouping_sets, 
remove_useless_groupby_columns - we don't apply this feature in the case 
of grouping sets.
- get_number_of_groups - estimation shouldn't depend on the order of 
clauses.
- create_grouping_paths - to analyse hashability of grouping clause
- make_group_input_target, make_partial_grouping_target - doesn't depend 
on the order of clauses
planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in 
the case of hash grouping.

So, the only case we can impact current behavior lies in the 
postgres_fdw. But here we don't really know which plan will be chosen 
during planning on foreign node and stay the same behavior is legal for me.
> am not comforted by the fact that the patch changed exactly nothing
> in the pathnodes.h documentation of that field.  This comment looks
> pretty silly now too:
> 
>              /* Preprocess regular GROUP BY clause, if any */
>              root->processed_groupClause = list_copy(parse->groupClause);
> 
> What "preprocessing" is going on there?  This comment was adequate
> before, when the code line invoked preprocess_groupclause which had
> a bunch of commentary; but now it has to stand on its own and it's
> not doing a great job of that.
Thanks for picking this place. I fixed it.
More interesting here is that we move decisions on the order of group-by 
clauses to the stage, where we already have underlying subtree and 
incoming path keys. In principle, we could return the preprocessing 
routine and arrange GROUP-BY with the ORDER-BY clause as it was before. 
But looking into the code, I found only one reason to do this: 
adjust_group_pathkeys_for_groupagg. Curious about how much profit we get 
here, I think we can discover it later with no hurry. A good outcome 
here will be a test case that can show the importance of arranging 
GROUP-BY and ORDER-BY at an early stage.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: improve performance of pg_dump --binary-upgrade
Next
From: "Imseih (AWS), Sami"
Date:
Subject: Re: allow changing autovacuum_max_workers without restarting