Re: Some revises in adding sorting path - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Some revises in adding sorting path |
Date | |
Msg-id | CAMbWs4-X42MCvgZUWjC3VrTXm5eQzbKxQBLapwKgjJehQZ0JYQ@mail.gmail.com Whole thread Raw |
In response to | Re: Some revises in adding sorting path (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Some revises in adding sorting path
|
List | pgsql-hackers |
On Tue, Feb 14, 2023 at 10:53 AM David Rowley <dgrowleyml@gmail.com> wrote:
I looked at the three patches and have some thoughts:
Thanks for reviewing!
0001:
Does the newly added test have to be this complex? I think it might
be better to just come up with the most simple test you can that uses
an incremental sort. I really can't think why the test requires a FOR
UPDATE, to test incremental sort, for example. The danger with making
a test more complex than it needs to be is that it frequently gets
broken by unrelated changes. The complexity makes it harder for
people to understand the test's intentions and that increases the risk
that the test eventually does not test what it was originally meant to
test as the underlying code changes and the expected output is
updated.
That makes sense. I agree that we should always use the minimal query
for test. For this patch, as pointed by Etsuro, it may not be a good
idea for the change. So I'll remove 0001.
for test. For this patch, as pointed by Etsuro, it may not be a good
idea for the change. So I'll remove 0001.
0002:
I think the following existing comment does not seem to be true any longer:
> However, there's one more
> * possibility: it may make sense to sort the cheapest partial path
> * according to the required output order and then use Gather Merge.
You've removed the comment that talks about trying Incremental Sort ->
Gather Merge paths yet the code is still doing that, the two are just
more consolidated now, so perhaps you need to come up with a new
comment to explain what we're trying to achieve.
Yes, you are right. How about the comment below?
- * possibility: it may make sense to sort the cheapest partial path
- * according to the required output order and then use Gather Merge.
+ * possibility: it may make sense to sort the cheapest partial path or
+ * incrementally sort any partial path that is partially sorted according
+ * to the required output order and then use Gather Merge.
Looking at the codes now I have some concern that what we do in
create_ordered_paths for partial paths may have already been done in
generate_useful_gather_paths, especially when query_pathkeys is equal to
sort_pathkeys. Not sure if this is a problem. And the comment there
mentions generate_gather_paths(), but ISTM we should mention what
generate_useful_gather_paths has done.
- * possibility: it may make sense to sort the cheapest partial path
- * according to the required output order and then use Gather Merge.
+ * possibility: it may make sense to sort the cheapest partial path or
+ * incrementally sort any partial path that is partially sorted according
+ * to the required output order and then use Gather Merge.
Looking at the codes now I have some concern that what we do in
create_ordered_paths for partial paths may have already been done in
generate_useful_gather_paths, especially when query_pathkeys is equal to
sort_pathkeys. Not sure if this is a problem. And the comment there
mentions generate_gather_paths(), but ISTM we should mention what
generate_useful_gather_paths has done.
0003:
Looking at gather_grouping_paths(), I see it calls
generate_useful_gather_paths() which generates a bunch of Gather Merge
paths after sorting the cheapest path and incrementally sorting any
partially sorted paths. Then back in gather_grouping_paths(), we go
and create Gather Merge paths again, but this time according to the
group_pathkeys instead of the query_pathkeys. I know you're not really
changing anything here, but as I'm struggling to understand why
exactly we're creating two sets of Gather Merge paths, it makes it a
bit scary to consider changing anything in this area. I've not really
found any comments that can explain to me sufficiently well enough so
that I understand why this needs to be done. Do you know?
I'm also not sure why gather_grouping_paths creates Gather Merge paths
according to group_pathkeys after what generate_useful_gather_paths has
done. There is comment that mentions this but seems more explanation is
needed.
* generate_useful_gather_paths does most of the work, but we also consider a
* special case: we could try sorting the data by the group_pathkeys and then
* applying Gather Merge.
It seems that if there is available group_pathkeys, we will set
query_pathkeys to group_pathkeys because we want the result sorted for
grouping. In this case gather_grouping_paths may just generate
duplicate Gather Merge paths because generate_useful_gather_paths has
generated Gather Merge paths according to query_pathkeys. I tried to
reduce the code of gather_grouping_paths to just call
generate_useful_gather_paths and found no diffs in regression tests.
Thanks
Richard
according to group_pathkeys after what generate_useful_gather_paths has
done. There is comment that mentions this but seems more explanation is
needed.
* generate_useful_gather_paths does most of the work, but we also consider a
* special case: we could try sorting the data by the group_pathkeys and then
* applying Gather Merge.
It seems that if there is available group_pathkeys, we will set
query_pathkeys to group_pathkeys because we want the result sorted for
grouping. In this case gather_grouping_paths may just generate
duplicate Gather Merge paths because generate_useful_gather_paths has
generated Gather Merge paths according to query_pathkeys. I tried to
reduce the code of gather_grouping_paths to just call
generate_useful_gather_paths and found no diffs in regression tests.
Thanks
Richard
pgsql-hackers by date: