Home > mailing lists

Re: Consider explicit incremental sort for Append and MergeAppend - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: Consider explicit incremental sort for Append and MergeAppend
Date	May 22 12:07:22
Msg-id	CAMbWs49wSNPPD=FOQqzjPNZ_N9EGDv=7-ou0dFgd0HSwP3fTAg@mail.gmail.com Whole thread Raw
List	pgsql-hackers

Tree view

On Mon, May 19, 2025 at 10:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, May 15, 2025 at 9:03 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > 2. IncrementalSort is not always more effective - it depends on the
> > column's number of groups. In my experience, a non-cost-based decision
> > one day meets the problematic case, and the people who stick with it are
> > much more confused than in the case when planner decision connected to
> > the costings - they trust the cost model or the cost model tuned by GUCs.

> I wonder if this could be fixed in nodeIncrementalSort.c. I think it's
> a problem to rely on planner estimates because planner estimates of
> the # of groups are quite unreliable. But at runtime, we can notice
> whether the groups are small or large and adjust the algorithm
> accordingly. In fact, it looks like we already have some logic for
> that:
>
> /*
>  * Sorting many small groups with tuplesort is inefficient. In order to
>  * cope with this problem we don't start a new group until the current one
>  * contains at least DEFAULT_MIN_GROUP_SIZE tuples (unfortunately this also
>  * means we can't assume small groups of tuples all have the same prefix keys.)
>  * When we have a bound that's less than DEFAULT_MIN_GROUP_SIZE we start looking
>  * for the new group as soon as we've met our bound to avoid fetching more
>  * tuples than we absolutely have to fetch.
>  */
> #define DEFAULT_MIN_GROUP_SIZE 32
>
> But maybe this logic needs to be further refined somehow. I can't
> shake the feeling that it's going to be really hard to have every
> place that uses incremental sort decide whether to use an incremental
> sort or a regular sort -- we should try to get to a place where it's
> safe to use an incremental sort when it's possible without worrying
> that it might actually be slower.

Agreed.  In some cases, we currently don't have the infrastructure to
consider both incremental sort and full sort and compare their costs —
for example, when inserting explicit sorts for mergejoins, or, as in
this case, for Append/MergeAppend.

Besides, I think the planner currently assumes that incremental sort
is always faster than full sort when there are presorted keys, and
this premise has been applied in various parts of the code.  I checked
all the callers of create_incremental_sort_path, and they all follow
the logic that if there are presorted keys, only incremental sort is
considered.

Also, to understand how incremental sort performs in the worst case, I
ran the following test.

create table ab (a int not null, b int not null);
insert into ab select 0,x from generate_Series(1,999000)x union all
select x%1000+1,0 from generate_Series(999001,1000000)x;

create index on ab (a);

vacuum analyze ab;

In this table, group 0 has 999000 rows, and the remaining groups
1-1000 have just 1 row each.

-- incremental sort
explain (analyze, costs off) select * from ab order by a,b;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Incremental Sort (actual time=585.093..714.589 rows=1000000.00 loops=1)
   Sort Key: a, b
   Presorted Key: a
   Full-sort Groups: 33  Sort Method: quicksort  Average Memory: 26kB
Peak Memory: 26kB
   Pre-sorted Groups: 1  Sort Method: external merge  Average Disk:
17680kB  Peak Disk: 17680kB
   Buffers: shared hit=5273, temp read=2210 written=2222
   ->  Index Scan using ab_a_idx on ab (actual time=0.192..330.289
rows=1000000.00 loops=1)
         Index Searches: 1
         Buffers: shared hit=5273
 Planning Time: 0.354 ms
 Execution Time: 759.693 ms
(11 rows)

-- full sort
explain (analyze, costs off) select * from ab order by a,b;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Sort (actual time=570.755..831.825 rows=1000000.00 loops=1)
   Sort Key: a, b
   Sort Method: external merge  Disk: 17704kB
   Buffers: shared hit=5273, temp read=2213 written=2225
   ->  Index Scan using ab_a_idx on ab (actual time=0.187..327.701
rows=1000000.00 loops=1)
         Index Searches: 1
         Buffers: shared hit=5273
 Planning Time: 0.304 ms
 Execution Time: 877.112 ms
(9 rows)

So with the same subpath, incremental sort still outperforms full sort
even if there is a big skew in the number of rows per pre-sorted group
(which is a bit surprising to me).

So, I think it's generally safe to use incremental sort whenever
possible.  There might be some corner cases where incremental sort is
slower than full sort, and I think it would be best to address those
in nodeIncrementalSort.c, as Robert suggested.

Thanks
Richard

pgsql-hackers by date:

From: Dmitry Dolgov
Date: 22 May, 12:03:46
Subject: Re: queryId constant squashing does not support prepared statements

From: Andy Fan
Date: 22 May, 12:27:48
Subject: Re: Pathify RHS unique-ification for semijoin planning

Re: Consider explicit incremental sort for Append and MergeAppend - Mailing list pgsql-hackers

Previous

Next