Thread: Some revises in adding sorting path

Some revises in adding sorting path

From
Richard Guo
Date:
While reviewing [1], I visited other places where sorting is needed, and
have some findings.

In add_paths_with_pathkeys_for_rel, we do not try incremental sort atop
of the epq_path, which I think we can do.  I'm not sure how useful this
is in real world since the epq_path is used only for EPQ checks, but it
seems doing that doesn't cost too much.

In create_ordered_paths, we are trying to sort the cheapest partial path
and incremental sort on any partial paths with presorted keys, and then
use Gather Merge.  If the cheapest partial path is not completely sorted
but happens to have presorted keys, we would create a full sort path and
an incremental sort path on it.  I think this is not what we want.  We
are supposed to only create an incremental sort path if there are
presorted keys.

In gather_grouping_paths, we have the same issue.  In addition, for the
incremental sort paths created atop partial paths, we neglect to
calculate 'total_groups' before we use it in create_gather_merge_path.

[1] https://www.postgresql.org/message-id/flat/CAApHDvo8Lz2H%3D42urBbfP65LTcEUOh288MT7DsG2_EWtW1AXHQ%40mail.gmail.com

Thanks
Richard
Attachment

Re: Some revises in adding sorting path

From
David Rowley
Date:
I looked at the three patches and have some thoughts:

0001:

Does the newly added test have to be this complex?  I think it might
be better to just come up with the most simple test you can that uses
an incremental sort. I really can't think why the test requires a FOR
UPDATE, to test incremental sort, for example. The danger with making
a test more complex than it needs to be is that it frequently gets
broken by unrelated changes.  The complexity makes it harder for
people to understand the test's intentions and that increases the risk
that the test eventually does not test what it was originally meant to
test as the underlying code changes and the expected output is
updated.

0002:

I think the following existing comment does not seem to be true any longer:

> However, there's one more
> * possibility: it may make sense to sort the cheapest partial path
> * according to the required output order and then use Gather Merge.

You've removed the comment that talks about trying Incremental Sort ->
Gather Merge paths yet the code is still doing that, the two are just
more consolidated now, so perhaps you need to come up with a new
comment to explain what we're trying to achieve.

> * already (no need to deal with paths which have presorted
> * keys when incremental sort is disabled unless it's the
> * cheapest input path).

I think "input path" should be "partial path". (I maybe didn't get
that right in all places in 4a29eabd1).

0003:

Looking at gather_grouping_paths(), I see it calls
generate_useful_gather_paths() which generates a bunch of Gather Merge
paths after sorting the cheapest path and incrementally sorting any
partially sorted paths.  Then back in gather_grouping_paths(), we go
and create Gather Merge paths again, but this time according to the
group_pathkeys instead of the query_pathkeys. I know you're not really
changing anything here, but as I'm struggling to understand why
exactly we're creating two sets of Gather Merge paths, it makes it a
bit scary to consider changing anything in this area. I've not really
found any comments that can explain to me sufficiently well enough so
that I understand why this needs to be done.  Do you know?

David



Re: Some revises in adding sorting path

From
Etsuro Fujita
Date:
Hi Richard,

On Tue, Jan 10, 2023 at 8:06 PM Richard Guo <guofenglinux@gmail.com> wrote:
> In add_paths_with_pathkeys_for_rel, we do not try incremental sort atop
> of the epq_path, which I think we can do.  I'm not sure how useful this
> is in real world since the epq_path is used only for EPQ checks, but it
> seems doing that doesn't cost too much.

I'm not sure this is a good idea, because the epq_path will return at
most one tuple in an EPQ recheck.

The reason why an extra Sort node is injected into the epq_path is
only label it with the correct sort order to use it as an input for
the EPQ merge-join path of a higher-level foreign join, so shouldn't
we keep this step as much as simple and save cycles even a little?

Sorry for being late to the party.

Best regards,
Etsuro Fujita



Re: Some revises in adding sorting path

From
Richard Guo
Date:

On Thu, Feb 16, 2023 at 7:50 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
I'm not sure this is a good idea, because the epq_path will return at
most one tuple in an EPQ recheck.

The reason why an extra Sort node is injected into the epq_path is
only label it with the correct sort order to use it as an input for
the EPQ merge-join path of a higher-level foreign join, so shouldn't
we keep this step as much as simple and save cycles even a little?
 
Agreed.  Thanks for the explanation.  I also wondered whether it's
worthwhile to do the change here.  I'll remove the 0001 patch.

Thanks
Richard

Re: Some revises in adding sorting path

From
Richard Guo
Date:

On Tue, Feb 14, 2023 at 10:53 AM David Rowley <dgrowleyml@gmail.com> wrote:
I looked at the three patches and have some thoughts:
 
Thanks for reviewing!
 
0001:

Does the newly added test have to be this complex?  I think it might
be better to just come up with the most simple test you can that uses
an incremental sort. I really can't think why the test requires a FOR
UPDATE, to test incremental sort, for example. The danger with making
a test more complex than it needs to be is that it frequently gets
broken by unrelated changes.  The complexity makes it harder for
people to understand the test's intentions and that increases the risk
that the test eventually does not test what it was originally meant to
test as the underlying code changes and the expected output is
updated.
 
That makes sense.  I agree that we should always use the minimal query
for test.  For this patch, as pointed by Etsuro, it may not be a good
idea for the change.  So I'll remove 0001.
 
0002:

I think the following existing comment does not seem to be true any longer:

> However, there's one more
> * possibility: it may make sense to sort the cheapest partial path
> * according to the required output order and then use Gather Merge.

You've removed the comment that talks about trying Incremental Sort ->
Gather Merge paths yet the code is still doing that, the two are just
more consolidated now, so perhaps you need to come up with a new
comment to explain what we're trying to achieve.
 
Yes, you are right.  How about the comment below?

- * possibility: it may make sense to sort the cheapest partial path
- * according to the required output order and then use Gather Merge.
+ * possibility: it may make sense to sort the cheapest partial path or
+ * incrementally sort any partial path that is partially sorted according
+ * to the required output order and then use Gather Merge.

Looking at the codes now I have some concern that what we do in
create_ordered_paths for partial paths may have already been done in
generate_useful_gather_paths, especially when query_pathkeys is equal to
sort_pathkeys.  Not sure if this is a problem.  And the comment there
mentions generate_gather_paths(), but ISTM we should mention what
generate_useful_gather_paths has done.
 
0003:

Looking at gather_grouping_paths(), I see it calls
generate_useful_gather_paths() which generates a bunch of Gather Merge
paths after sorting the cheapest path and incrementally sorting any
partially sorted paths.  Then back in gather_grouping_paths(), we go
and create Gather Merge paths again, but this time according to the
group_pathkeys instead of the query_pathkeys. I know you're not really
changing anything here, but as I'm struggling to understand why
exactly we're creating two sets of Gather Merge paths, it makes it a
bit scary to consider changing anything in this area. I've not really
found any comments that can explain to me sufficiently well enough so
that I understand why this needs to be done.  Do you know?
 
I'm also not sure why gather_grouping_paths creates Gather Merge paths
according to group_pathkeys after what generate_useful_gather_paths has
done.  There is comment that mentions this but seems more explanation is
needed.

* generate_useful_gather_paths does most of the work, but we also consider a
* special case: we could try sorting the data by the group_pathkeys and then
* applying Gather Merge.

It seems that if there is available group_pathkeys, we will set
query_pathkeys to group_pathkeys because we want the result sorted for
grouping.  In this case gather_grouping_paths may just generate
duplicate Gather Merge paths because generate_useful_gather_paths has
generated Gather Merge paths according to query_pathkeys.  I tried to
reduce the code of gather_grouping_paths to just call
generate_useful_gather_paths and found no diffs in regression tests.

Thanks
Richard

Re: Some revises in adding sorting path

From
David Rowley
Date:
On Tue, 21 Feb 2023 at 22:02, Richard Guo <guofenglinux@gmail.com> wrote:
> Looking at the codes now I have some concern that what we do in
> create_ordered_paths for partial paths may have already been done in
> generate_useful_gather_paths, especially when query_pathkeys is equal to
> sort_pathkeys.  Not sure if this is a problem.  And the comment there
> mentions generate_gather_paths(), but ISTM we should mention what
> generate_useful_gather_paths has done.

I think you need to write some tests for this. I've managed to come up
with something to get that code to perform a Sort, but I've not
managed to get it to perform an incremental sort.

create or replace function parallel_safe_volatile(a int) returns int
as $$ begin return a; end; $$ parallel safe volatile language plpgsql;
create table abc(a int, b int, c int);
insert into abc select x,y,z from generate_Series(1,100)x,
generate_Series(1,100)y, generate_Series(1,100)z;
set parallel_tuple_cost=0;

Without making those parallel paths, we get:

postgres=# explain select * from abc where a=1 order by
a,b,parallel_safe_volatile(c);
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Sort  (cost=13391.49..13417.49 rows=10400 width=16)
   Sort Key: b, (parallel_safe_volatile(c))
   ->  Gather  (cost=1000.00..12697.58 rows=10400 width=16)
         Workers Planned: 2
         ->  Parallel Seq Scan on abc  (cost=0.00..11697.58 rows=4333 width=16)
               Filter: (a = 1)
(6 rows)

but with, the plan is:

postgres=# explain select * from abc where a=1 order by
a,b,parallel_safe_volatile(c);
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Gather Merge  (cost=12959.35..13060.52 rows=8666 width=16)
   Workers Planned: 2
   ->  Sort  (cost=11959.32..11970.15 rows=4333 width=16)
         Sort Key: b, (parallel_safe_volatile(c))
         ->  Parallel Seq Scan on abc  (cost=0.00..11697.58 rows=4333 width=16)
               Filter: (a = 1)
(6 rows)

I added the parallel safe and volatile function so that
get_useful_pathkeys_for_relation() wouldn't include all of the
query_pathkeys.

If you write some tests for this code, it will be useful to prove that
it actually does something, and also that it does not break again in
the future. I don't really want to just blindly copy the pattern used
in 3c6fc5820 for creating incremental sort paths if it's not useful
here. It would be good to see tests that make an Incremental Sort path
using the code you're changing.

Same for the 0003 patch.

I'll mark this as waiting on author while you work on that.

David



Re: Some revises in adding sorting path

From
Daniel Gustafsson
Date:
> On 28 Mar 2023, at 21:59, David Rowley <dgrowleyml@gmail.com> wrote:

> I'll mark this as waiting on author while you work on that.

Richard: have you had a chance to incorporate the tests proposed by David in
this thread into your patch?

--
Daniel Gustafsson




Re: Some revises in adding sorting path

From
Richard Guo
Date:

On Wed, Mar 29, 2023 at 4:00 AM David Rowley <dgrowleyml@gmail.com> wrote:
If you write some tests for this code, it will be useful to prove that
it actually does something, and also that it does not break again in
the future. I don't really want to just blindly copy the pattern used
in 3c6fc5820 for creating incremental sort paths if it's not useful
here. It would be good to see tests that make an Incremental Sort path
using the code you're changing.

Thanks for the suggestion.  I've managed to come up with a query that
gets the codes being changed here to perform an incremental sort.

set min_parallel_index_scan_size to 0;
set enable_seqscan to off;

Without making those parallel paths:

explain (costs off)
select * from tenk1 where four = 2 order by four, hundred, parallel_safe_volatile(thousand);
                          QUERY PLAN
--------------------------------------------------------------
 Incremental Sort
   Sort Key: hundred, (parallel_safe_volatile(thousand))
   Presorted Key: hundred
   ->  Gather Merge
         Workers Planned: 3
         ->  Parallel Index Scan using tenk1_hundred on tenk1
               Filter: (four = 2)
(7 rows)

and with those parallel paths:

explain (costs off)
select * from tenk1 where four = 2 order by four, hundred, parallel_safe_volatile(thousand);
                          QUERY PLAN
---------------------------------------------------------------
 Gather Merge
   Workers Planned: 3
   ->  Incremental Sort
         Sort Key: hundred, (parallel_safe_volatile(thousand))
         Presorted Key: hundred
         ->  Parallel Index Scan using tenk1_hundred on tenk1
               Filter: (four = 2)
(7 rows)

I've added two tests for the code changes in create_ordered_paths in the
new patch.
 
Same for the 0003 patch.

For the code changes in gather_grouping_paths, I've managed to come up
with a query that makes an explicit Sort atop cheapest partial path.

explain (costs off)
select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
                             QUERY PLAN
--------------------------------------------------------------------
 Finalize GroupAggregate
   Group Key: twenty, (parallel_safe_volatile(two))
   ->  Gather Merge
         Workers Planned: 4
         ->  Sort
               Sort Key: twenty, (parallel_safe_volatile(two))
               ->  Partial HashAggregate
                     Group Key: twenty, parallel_safe_volatile(two)
                     ->  Parallel Seq Scan on tenk1
(9 rows)

Without this logic the plan would look like:

explain (costs off)
select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
                             QUERY PLAN
--------------------------------------------------------------------
 Finalize GroupAggregate
   Group Key: twenty, (parallel_safe_volatile(two))
   ->  Sort
         Sort Key: twenty, (parallel_safe_volatile(two))
         ->  Gather
               Workers Planned: 4
               ->  Partial HashAggregate
                     Group Key: twenty, parallel_safe_volatile(two)
                     ->  Parallel Seq Scan on tenk1
(9 rows)

This test is also added in the new patch.

But I did not find a query that makes an incremental sort in this case.
After trying for a while it seems to me that we do not need to consider
incremental sort in this case, because for a partial path of a grouped
or partially grouped relation, it is either unordered (HashAggregate or
Append), or it has been ordered by the group_pathkeys (GroupAggregate).
It seems there is no case that we'd have a partial path that is
partially sorted.

So update the patches to v2.

Thanks
Richard
Attachment

Re: Some revises in adding sorting path

From
Richard Guo
Date:

On Mon, Jul 10, 2023 at 5:37 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 28 Mar 2023, at 21:59, David Rowley <dgrowleyml@gmail.com> wrote:
> I'll mark this as waiting on author while you work on that.

Richard: have you had a chance to incorporate the tests proposed by David in
this thread into your patch?

I just updated the patches according to David's reviews.  So I'll change
it back to needs review.

Thanks
Richard

Re: Some revises in adding sorting path

From
Shubham Khanna
Date:
On Thu, Dec 28, 2023 at 4:00 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> On Wed, Mar 29, 2023 at 4:00 AM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> If you write some tests for this code, it will be useful to prove that
>> it actually does something, and also that it does not break again in
>> the future. I don't really want to just blindly copy the pattern used
>> in 3c6fc5820 for creating incremental sort paths if it's not useful
>> here. It would be good to see tests that make an Incremental Sort path
>> using the code you're changing.
>
>
> Thanks for the suggestion.  I've managed to come up with a query that
> gets the codes being changed here to perform an incremental sort.
>
> set min_parallel_index_scan_size to 0;
> set enable_seqscan to off;
>
> Without making those parallel paths:
>
> explain (costs off)
> select * from tenk1 where four = 2 order by four, hundred, parallel_safe_volatile(thousand);
>                           QUERY PLAN
> --------------------------------------------------------------
>  Incremental Sort
>    Sort Key: hundred, (parallel_safe_volatile(thousand))
>    Presorted Key: hundred
>    ->  Gather Merge
>          Workers Planned: 3
>          ->  Parallel Index Scan using tenk1_hundred on tenk1
>                Filter: (four = 2)
> (7 rows)
>
> and with those parallel paths:
>
> explain (costs off)
> select * from tenk1 where four = 2 order by four, hundred, parallel_safe_volatile(thousand);
>                           QUERY PLAN
> ---------------------------------------------------------------
>  Gather Merge
>    Workers Planned: 3
>    ->  Incremental Sort
>          Sort Key: hundred, (parallel_safe_volatile(thousand))
>          Presorted Key: hundred
>          ->  Parallel Index Scan using tenk1_hundred on tenk1
>                Filter: (four = 2)
> (7 rows)
>
> I've added two tests for the code changes in create_ordered_paths in the
> new patch.
>
>>
>> Same for the 0003 patch.
>
>
> For the code changes in gather_grouping_paths, I've managed to come up
> with a query that makes an explicit Sort atop cheapest partial path.
>
> explain (costs off)
> select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Finalize GroupAggregate
>    Group Key: twenty, (parallel_safe_volatile(two))
>    ->  Gather Merge
>          Workers Planned: 4
>          ->  Sort
>                Sort Key: twenty, (parallel_safe_volatile(two))
>                ->  Partial HashAggregate
>                      Group Key: twenty, parallel_safe_volatile(two)
>                      ->  Parallel Seq Scan on tenk1
> (9 rows)
>
> Without this logic the plan would look like:
>
> explain (costs off)
> select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Finalize GroupAggregate
>    Group Key: twenty, (parallel_safe_volatile(two))
>    ->  Sort
>          Sort Key: twenty, (parallel_safe_volatile(two))
>          ->  Gather
>                Workers Planned: 4
>                ->  Partial HashAggregate
>                      Group Key: twenty, parallel_safe_volatile(two)
>                      ->  Parallel Seq Scan on tenk1
> (9 rows)
>
> This test is also added in the new patch.
>
> But I did not find a query that makes an incremental sort in this case.
> After trying for a while it seems to me that we do not need to consider
> incremental sort in this case, because for a partial path of a grouped
> or partially grouped relation, it is either unordered (HashAggregate or
> Append), or it has been ordered by the group_pathkeys (GroupAggregate).
> It seems there is no case that we'd have a partial path that is
> partially sorted.
>
I reviewed the Patch and it looks fine to me.

Thanks and Regards,
Shubham Khanna.



Re: Some revises in adding sorting path

From
Shubham Khanna
Date:
On Thu, Dec 28, 2023 at 4:01 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Thu, Dec 28, 2023 at 4:00 PM Richard Guo <guofenglinux@gmail.com> wrote:
> >
> >
> > On Wed, Mar 29, 2023 at 4:00 AM David Rowley <dgrowleyml@gmail.com> wrote:
> >>
> >> If you write some tests for this code, it will be useful to prove that
> >> it actually does something, and also that it does not break again in
> >> the future. I don't really want to just blindly copy the pattern used
> >> in 3c6fc5820 for creating incremental sort paths if it's not useful
> >> here. It would be good to see tests that make an Incremental Sort path
> >> using the code you're changing.
> >
> >
> > Thanks for the suggestion.  I've managed to come up with a query that
> > gets the codes being changed here to perform an incremental sort.
> >
> > set min_parallel_index_scan_size to 0;
> > set enable_seqscan to off;
> >
> > Without making those parallel paths:
> >
> > explain (costs off)
> > select * from tenk1 where four = 2 order by four, hundred, parallel_safe_volatile(thousand);
> >                           QUERY PLAN
> > --------------------------------------------------------------
> >  Incremental Sort
> >    Sort Key: hundred, (parallel_safe_volatile(thousand))
> >    Presorted Key: hundred
> >    ->  Gather Merge
> >          Workers Planned: 3
> >          ->  Parallel Index Scan using tenk1_hundred on tenk1
> >                Filter: (four = 2)
> > (7 rows)
> >
> > and with those parallel paths:
> >
> > explain (costs off)
> > select * from tenk1 where four = 2 order by four, hundred, parallel_safe_volatile(thousand);
> >                           QUERY PLAN
> > ---------------------------------------------------------------
> >  Gather Merge
> >    Workers Planned: 3
> >    ->  Incremental Sort
> >          Sort Key: hundred, (parallel_safe_volatile(thousand))
> >          Presorted Key: hundred
> >          ->  Parallel Index Scan using tenk1_hundred on tenk1
> >                Filter: (four = 2)
> > (7 rows)
> >
> > I've added two tests for the code changes in create_ordered_paths in the
> > new patch.
> >
> >>
> >> Same for the 0003 patch.
> >
> >
> > For the code changes in gather_grouping_paths, I've managed to come up
> > with a query that makes an explicit Sort atop cheapest partial path.
> >
> > explain (costs off)
> > select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
> >                              QUERY PLAN
> > --------------------------------------------------------------------
> >  Finalize GroupAggregate
> >    Group Key: twenty, (parallel_safe_volatile(two))
> >    ->  Gather Merge
> >          Workers Planned: 4
> >          ->  Sort
> >                Sort Key: twenty, (parallel_safe_volatile(two))
> >                ->  Partial HashAggregate
> >                      Group Key: twenty, parallel_safe_volatile(two)
> >                      ->  Parallel Seq Scan on tenk1
> > (9 rows)
> >
> > Without this logic the plan would look like:
> >
> > explain (costs off)
> > select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
> >                              QUERY PLAN
> > --------------------------------------------------------------------
> >  Finalize GroupAggregate
> >    Group Key: twenty, (parallel_safe_volatile(two))
> >    ->  Sort
> >          Sort Key: twenty, (parallel_safe_volatile(two))
> >          ->  Gather
> >                Workers Planned: 4
> >                ->  Partial HashAggregate
> >                      Group Key: twenty, parallel_safe_volatile(two)
> >                      ->  Parallel Seq Scan on tenk1
> > (9 rows)
> >
> > This test is also added in the new patch.
> >
> > But I did not find a query that makes an incremental sort in this case.
> > After trying for a while it seems to me that we do not need to consider
> > incremental sort in this case, because for a partial path of a grouped
> > or partially grouped relation, it is either unordered (HashAggregate or
> > Append), or it has been ordered by the group_pathkeys (GroupAggregate).
> > It seems there is no case that we'd have a partial path that is
> > partially sorted.
> >
Just for clarity; I am not familiar with the code. And for the review,
I ran 'make check' and 'make check-world' and all the test cases
passed successfully.

Thanks and Regards,
Shubham Khanna.



Re: Some revises in adding sorting path

From
vignesh C
Date:
On Mon, 17 Jul 2023 at 14:25, Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> On Wed, Mar 29, 2023 at 4:00 AM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> If you write some tests for this code, it will be useful to prove that
>> it actually does something, and also that it does not break again in
>> the future. I don't really want to just blindly copy the pattern used
>> in 3c6fc5820 for creating incremental sort paths if it's not useful
>> here. It would be good to see tests that make an Incremental Sort path
>> using the code you're changing.
>
>
> Thanks for the suggestion.  I've managed to come up with a query that
> gets the codes being changed here to perform an incremental sort.
>
> set min_parallel_index_scan_size to 0;
> set enable_seqscan to off;
>
> Without making those parallel paths:
>
> explain (costs off)
> select * from tenk1 where four = 2 order by four, hundred, parallel_safe_volatile(thousand);
>                           QUERY PLAN
> --------------------------------------------------------------
>  Incremental Sort
>    Sort Key: hundred, (parallel_safe_volatile(thousand))
>    Presorted Key: hundred
>    ->  Gather Merge
>          Workers Planned: 3
>          ->  Parallel Index Scan using tenk1_hundred on tenk1
>                Filter: (four = 2)
> (7 rows)
>
> and with those parallel paths:
>
> explain (costs off)
> select * from tenk1 where four = 2 order by four, hundred, parallel_safe_volatile(thousand);
>                           QUERY PLAN
> ---------------------------------------------------------------
>  Gather Merge
>    Workers Planned: 3
>    ->  Incremental Sort
>          Sort Key: hundred, (parallel_safe_volatile(thousand))
>          Presorted Key: hundred
>          ->  Parallel Index Scan using tenk1_hundred on tenk1
>                Filter: (four = 2)
> (7 rows)
>
> I've added two tests for the code changes in create_ordered_paths in the
> new patch.
>
>>
>> Same for the 0003 patch.
>
>
> For the code changes in gather_grouping_paths, I've managed to come up
> with a query that makes an explicit Sort atop cheapest partial path.
>
> explain (costs off)
> select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Finalize GroupAggregate
>    Group Key: twenty, (parallel_safe_volatile(two))
>    ->  Gather Merge
>          Workers Planned: 4
>          ->  Sort
>                Sort Key: twenty, (parallel_safe_volatile(two))
>                ->  Partial HashAggregate
>                      Group Key: twenty, parallel_safe_volatile(two)
>                      ->  Parallel Seq Scan on tenk1
> (9 rows)
>
> Without this logic the plan would look like:
>
> explain (costs off)
> select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Finalize GroupAggregate
>    Group Key: twenty, (parallel_safe_volatile(two))
>    ->  Sort
>          Sort Key: twenty, (parallel_safe_volatile(two))
>          ->  Gather
>                Workers Planned: 4
>                ->  Partial HashAggregate
>                      Group Key: twenty, parallel_safe_volatile(two)
>                      ->  Parallel Seq Scan on tenk1
> (9 rows)
>
> This test is also added in the new patch.
>
> But I did not find a query that makes an incremental sort in this case.
> After trying for a while it seems to me that we do not need to consider
> incremental sort in this case, because for a partial path of a grouped
> or partially grouped relation, it is either unordered (HashAggregate or
> Append), or it has been ordered by the group_pathkeys (GroupAggregate).
> It seems there is no case that we'd have a partial path that is
> partially sorted.
>
> So update the patches to v2.

CFBot shows that the patch does not apply anymore as in [1]:
=== Applying patches on top of PostgreSQL commit ID
f2bf8fb04886e3ea82e7f7f86696ac78e06b7e60 ===
...
=== applying patch
./v2-0002-Revise-how-we-sort-partial-paths-in-gather_grouping_paths.patch
patching file src/backend/optimizer/plan/planner.c
Hunk #1 succeeded at 7289 (offset -91 lines).
Hunk #2 FAILED at 7411.
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/optimizer/plan/planner.c.rej

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4119.log

Regards,
Vignesh



Re: Some revises in adding sorting path

From
Richard Guo
Date:

On Mon, Jul 17, 2023 at 4:55 PM Richard Guo <guofenglinux@gmail.com> wrote:
But I did not find a query that makes an incremental sort in this case.
After trying for a while it seems to me that we do not need to consider
incremental sort in this case, because for a partial path of a grouped
or partially grouped relation, it is either unordered (HashAggregate or
Append), or it has been ordered by the group_pathkeys (GroupAggregate).
It seems there is no case that we'd have a partial path that is
partially sorted.

Since now we'd try to reorder the group by keys (see 0452b461bc), it is
possible that we have a partial path that is partially sorted.  So this
conclusion is not true any more.  For instance,

create table t (a int, b int, c int, d int);
insert into t select i%10, i%10, i%10, i%10 from generate_series(1,1000000)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
                                QUERY PLAN
--------------------------------------------------------------------------
 Finalize GroupAggregate
   Group Key: a, c, b, (parallel_safe_volatile(d))
   ->  Gather Merge
         Workers Planned: 2
         ->  Incremental Sort
               Sort Key: a, c, b, (parallel_safe_volatile(d))
               Presorted Key: a
               ->  Partial GroupAggregate
                     Group Key: a, b, c, (parallel_safe_volatile(d))
                     ->  Incremental Sort
                           Sort Key: a, b, c, (parallel_safe_volatile(d))
                           Presorted Key: a, b
                           ->  Parallel Index Scan using t_a_b_idx on t
(13 rows)

If we do not consider incremental sort on partial paths in
gather_grouping_paths(), we'd get a plan that looks like:

explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Finalize GroupAggregate
   Group Key: a, c, b, (parallel_safe_volatile(d))
   ->  Incremental Sort
         Sort Key: a, c, b, (parallel_safe_volatile(d))
         Presorted Key: a, c, b
         ->  Gather Merge
               Workers Planned: 2
               ->  Incremental Sort
                     Sort Key: a, c, b
                     Presorted Key: a
                     ->  Partial GroupAggregate
                           Group Key: a, b, c, (parallel_safe_volatile(d))
                           ->  Incremental Sort
                                 Sort Key: a, b, c, (parallel_safe_volatile(d))
                                 Presorted Key: a, b
                                 ->  Parallel Index Scan using t_a_b_idx on t
(16 rows)

So in the v3 patch I've brought back the logic that considers
incremental sort on partial paths in gather_grouping_paths().  However,
I failed to compose a test case for this scenario without having to
generate a huge table.  So in the v3 patch I did not include a test case
for this aspect.

Thanks
Richard
Attachment

Re: Some revises in adding sorting path

From
David Rowley
Date:
On Mon, 29 Jan 2024 at 22:39, Richard Guo <guofenglinux@gmail.com> wrote:
> So in the v3 patch I've brought back the logic that considers
> incremental sort on partial paths in gather_grouping_paths().  However,
> I failed to compose a test case for this scenario without having to
> generate a huge table.  So in the v3 patch I did not include a test case
> for this aspect.

Can you share the test with the huge table?

I tried and failed as, if I'm not mistaken, you're talking about a
parallel aggregate query with an incremental sort between the Partial
Aggregate node and the Finalize Group Aggregate node.  If the partial
aggregate was a Group Aggregate, then it would already be correctly
sorted.  We don't need a more strict sort ordering to perform the
Finalize Group Aggregate, the results must already be sorted by at
least the GROUP BY clause.  If the partial aggregate had opted to Hash
Aggregate, then there'd be no presorted keys, so we could only get a
full sort.  I can't see any way to get an incremental sort between the
2 aggregate phases.

What am I missing?

I also tried reverting your changes to planner.c to see if your new
tests would fail. They all passed. So it looks like none of these
tests are testing anything new.

David



Re: Some revises in adding sorting path

From
Richard Guo
Date:

On Tue, Jan 30, 2024 at 7:00 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 29 Jan 2024 at 22:39, Richard Guo <guofenglinux@gmail.com> wrote:
> So in the v3 patch I've brought back the logic that considers
> incremental sort on partial paths in gather_grouping_paths().  However,
> I failed to compose a test case for this scenario without having to
> generate a huge table.  So in the v3 patch I did not include a test case
> for this aspect.

Can you share the test with the huge table?

The test had been shown in upthread [1].  Pasting it here:

create table t (a int, b int, c int, d int);
insert into t select i%10, i%10, i%10, i%10 from generate_series(1,1000000)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
                                QUERY PLAN
--------------------------------------------------------------------------
 Finalize GroupAggregate
   Group Key: a, c, b, (parallel_safe_volatile(d))
   ->  Gather Merge
         Workers Planned: 2
         ->  Incremental Sort
               Sort Key: a, c, b, (parallel_safe_volatile(d))
               Presorted Key: a
               ->  Partial GroupAggregate
                     Group Key: a, b, c, (parallel_safe_volatile(d))
                     ->  Incremental Sort
                           Sort Key: a, b, c, (parallel_safe_volatile(d))
                           Presorted Key: a, b
                           ->  Parallel Index Scan using t_a_b_idx on t
(13 rows)
 
I tried and failed as, if I'm not mistaken, you're talking about a
parallel aggregate query with an incremental sort between the Partial
Aggregate node and the Finalize Group Aggregate node.  If the partial
aggregate was a Group Aggregate, then it would already be correctly
sorted.  We don't need a more strict sort ordering to perform the
Finalize Group Aggregate, the results must already be sorted by at
least the GROUP BY clause.  If the partial aggregate had opted to Hash
Aggregate, then there'd be no presorted keys, so we could only get a
full sort.  I can't see any way to get an incremental sort between the
2 aggregate phases.

What am I missing?

This was true before 0452b461bc, and I reached the same conclusion in
[2].  Quote it here:

"
But I did not find a query that makes an incremental sort in this case.
After trying for a while it seems to me that we do not need to consider
incremental sort in this case, because for a partial path of a grouped
or partially grouped relation, it is either unordered (HashAggregate or
Append), or it has been ordered by the group_pathkeys (GroupAggregate).
It seems there is no case that we'd have a partial path that is
partially sorted.
"

But if we've reordered the group by keys to match the input path's
pathkeys, we might have a partial GroupAggregate that is partially
sorted.  See the plan above.
 
I also tried reverting your changes to planner.c to see if your new
tests would fail. They all passed. So it looks like none of these
tests are testing anything new.

This patchset does not aim to introduce anything new; it simply
refactors the existing code.  The newly added tests are used to show
that the code that is touched here is not redundant, but rather
essential for generating certain paths.  I remember the tests were added
per your comment in [3].

[1] https://www.postgresql.org/message-id/CAMbWs4_iDwMAf5mp%2BG-tXq-gYzvR6koSHvNUqBPK4pt7%2B11tJw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAMbWs497h5jVCVwNDb%2BBX31Z_K8iBaPQKOcsTvpFQ7kF18a2%2Bg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com

Thanks
Richard

Re: Some revises in adding sorting path

From
David Rowley
Date:
On Wed, 31 Jan 2024 at 00:44, Richard Guo <guofenglinux@gmail.com> wrote:
> This patchset does not aim to introduce anything new; it simply
> refactors the existing code.  The newly added tests are used to show
> that the code that is touched here is not redundant, but rather
> essential for generating certain paths.  I remember the tests were added
> per your comment in [3].
>
> [3] https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com

OK.  I've pushed the patched based on it being a simplification of the
partial path generation.

David



Re: Some revises in adding sorting path

From
Richard Guo
Date:

On Wed, Jan 31, 2024 at 5:13 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 31 Jan 2024 at 00:44, Richard Guo <guofenglinux@gmail.com> wrote:
> This patchset does not aim to introduce anything new; it simply
> refactors the existing code.  The newly added tests are used to show
> that the code that is touched here is not redundant, but rather
> essential for generating certain paths.  I remember the tests were added
> per your comment in [3].
>
> [3] https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com

OK.  I've pushed the patched based on it being a simplification of the
partial path generation.

Thanks for pushing it!

Thanks
Richard