Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | CAMbWs4-wAwJgxAnUNjCz8Qgzgjwn9t_NN9=i46hVPdaOT6Jmmg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Andrei Lepikhov <a.lepikhov@postgrespro.ru>) |
Responses |
Re: POC: GROUP BY optimization
|
List | pgsql-hackers |
On Thu, Feb 22, 2024 at 12:18 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 22/2/2024 09:09, Richard Guo wrote:
> I looked through the v2 patch and have two comments.
>
> * The test case under "Check we don't pick aggregate path key instead of
> grouping path key" does not have EXPLAIN to show the plan. So how can
> we ensure we do not mistakenly select the aggregate pathkeys instead of
> the grouping pathkeys?
I confirm it works correctly. I am not sure about the stability of the
zeros number in the output on different platforms here:
avg
--------------------
4.0000000000000000
5.0000000000000000
It was why I'd used the format() function before. So, may we elaborate
on the test and restrict the output?
The avg() function on integer argument is commonly used in
aggregates.sql. I don't think this is an issue. See the first test
query in aggregates.sql.
SELECT avg(four) AS avg_1 FROM onek;
avg_1
--------------------
1.5000000000000000
(1 row)
aggregates.sql. I don't think this is an issue. See the first test
query in aggregates.sql.
SELECT avg(four) AS avg_1 FROM onek;
avg_1
--------------------
1.5000000000000000
(1 row)
> * I don't think the test case introduced by e1b7fde418 is still needed,
> because we already have one under "Utilize the ordering of merge join to
> avoid a full Sort operation". This kind of test case is just to ensure
> that we are able to utilize the ordering of the subplans underneath. So
> it should be parallel to the test cases for utilize the ordering of
> index scan and subquery scan.
Also, I'm unsure about removing the disabling of the
max_parallel_workers_per_gather parameter. Have you discovered the
domination of the current plan over the partial one? Do the cost
fluctuations across platforms not trigger a parallel plan?
The table used for testing contains only 100 tuples, which is the size
of only one page. I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.
of only one page. I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.
What's more, I suggest to address here the complaint from [1]. As I see,
cost difference between Sort and IncrementalSort strategies in that case
is around 0.5. To make the test more stable I propose to change it a bit
and add a limit:
SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10
cost points.
I don't think that's necessary. With Incremental Sort the final cost
is:
GroupAggregate (cost=1.66..19.00 rows=100 width=25)
while with full Sort it is:
GroupAggregate (cost=16.96..19.46 rows=100 width=25)
With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost. Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.
Thanks
Richard
is:
GroupAggregate (cost=1.66..19.00 rows=100 width=25)
while with full Sort it is:
GroupAggregate (cost=16.96..19.46 rows=100 width=25)
With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost. Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.
Thanks
Richard
pgsql-hackers by date: