Re: Some revises in adding sorting path - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Some revises in adding sorting path |
Date | |
Msg-id | CAMbWs4_YyQ4uPX79F7GL7zHA4siyHAc4OLXm8en8Le2kRtUq+A@mail.gmail.com Whole thread Raw |
In response to | Re: Some revises in adding sorting path (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Some revises in adding sorting path
|
List | pgsql-hackers |
On Tue, Jan 30, 2024 at 7:00 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 29 Jan 2024 at 22:39, Richard Guo <guofenglinux@gmail.com> wrote:
> So in the v3 patch I've brought back the logic that considers
> incremental sort on partial paths in gather_grouping_paths(). However,
> I failed to compose a test case for this scenario without having to
> generate a huge table. So in the v3 patch I did not include a test case
> for this aspect.
Can you share the test with the huge table?
The test had been shown in upthread [1]. Pasting it here:
create table t (a int, b int, c int, d int);
insert into t select i%10, i%10, i%10, i%10 from generate_series(1,1000000)i;
create index on t (a, b);
analyze t;
set enable_hashagg to off;
set enable_seqscan to off;
explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
QUERY PLAN
--------------------------------------------------------------------------
Finalize GroupAggregate
Group Key: a, c, b, (parallel_safe_volatile(d))
-> Gather Merge
Workers Planned: 2
-> Incremental Sort
Sort Key: a, c, b, (parallel_safe_volatile(d))
Presorted Key: a
-> Partial GroupAggregate
Group Key: a, b, c, (parallel_safe_volatile(d))
-> Incremental Sort
Sort Key: a, b, c, (parallel_safe_volatile(d))
Presorted Key: a, b
-> Parallel Index Scan using t_a_b_idx on t
(13 rows)
create table t (a int, b int, c int, d int);
insert into t select i%10, i%10, i%10, i%10 from generate_series(1,1000000)i;
create index on t (a, b);
analyze t;
set enable_hashagg to off;
set enable_seqscan to off;
explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
QUERY PLAN
--------------------------------------------------------------------------
Finalize GroupAggregate
Group Key: a, c, b, (parallel_safe_volatile(d))
-> Gather Merge
Workers Planned: 2
-> Incremental Sort
Sort Key: a, c, b, (parallel_safe_volatile(d))
Presorted Key: a
-> Partial GroupAggregate
Group Key: a, b, c, (parallel_safe_volatile(d))
-> Incremental Sort
Sort Key: a, b, c, (parallel_safe_volatile(d))
Presorted Key: a, b
-> Parallel Index Scan using t_a_b_idx on t
(13 rows)
I tried and failed as, if I'm not mistaken, you're talking about a
parallel aggregate query with an incremental sort between the Partial
Aggregate node and the Finalize Group Aggregate node. If the partial
aggregate was a Group Aggregate, then it would already be correctly
sorted. We don't need a more strict sort ordering to perform the
Finalize Group Aggregate, the results must already be sorted by at
least the GROUP BY clause. If the partial aggregate had opted to Hash
Aggregate, then there'd be no presorted keys, so we could only get a
full sort. I can't see any way to get an incremental sort between the
2 aggregate phases.
What am I missing?
This was true before 0452b461bc, and I reached the same conclusion in
[2]. Quote it here:
"
But I did not find a query that makes an incremental sort in this case.
After trying for a while it seems to me that we do not need to consider
incremental sort in this case, because for a partial path of a grouped
or partially grouped relation, it is either unordered (HashAggregate or
Append), or it has been ordered by the group_pathkeys (GroupAggregate).
It seems there is no case that we'd have a partial path that is
partially sorted.
"
But if we've reordered the group by keys to match the input path's
pathkeys, we might have a partial GroupAggregate that is partially
sorted. See the plan above.
[2]. Quote it here:
"
But I did not find a query that makes an incremental sort in this case.
After trying for a while it seems to me that we do not need to consider
incremental sort in this case, because for a partial path of a grouped
or partially grouped relation, it is either unordered (HashAggregate or
Append), or it has been ordered by the group_pathkeys (GroupAggregate).
It seems there is no case that we'd have a partial path that is
partially sorted.
"
But if we've reordered the group by keys to match the input path's
pathkeys, we might have a partial GroupAggregate that is partially
sorted. See the plan above.
I also tried reverting your changes to planner.c to see if your new
tests would fail. They all passed. So it looks like none of these
tests are testing anything new.
This patchset does not aim to introduce anything new; it simply
refactors the existing code. The newly added tests are used to show
that the code that is touched here is not redundant, but rather
essential for generating certain paths. I remember the tests were added
per your comment in [3].
[1] https://www.postgresql.org/message-id/CAMbWs4_iDwMAf5mp%2BG-tXq-gYzvR6koSHvNUqBPK4pt7%2B11tJw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAMbWs497h5jVCVwNDb%2BBX31Z_K8iBaPQKOcsTvpFQ7kF18a2%2Bg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com
Thanks
Richard
refactors the existing code. The newly added tests are used to show
that the code that is touched here is not redundant, but rather
essential for generating certain paths. I remember the tests were added
per your comment in [3].
[1] https://www.postgresql.org/message-id/CAMbWs4_iDwMAf5mp%2BG-tXq-gYzvR6koSHvNUqBPK4pt7%2B11tJw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAMbWs497h5jVCVwNDb%2BBX31Z_K8iBaPQKOcsTvpFQ7kF18a2%2Bg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com
Thanks
Richard
pgsql-hackers by date: