Home > mailing lists

Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: Why don't we consider explicit Incremental Sort?
Date	September 14 09:50:30
Msg-id	CAMbWs49+CXsrgbq0LD1at-5jR=AHHN0YtDy9YvgXAsMfndZe-w@mail.gmail.com Whole thread Raw
In response to	Re: Why don't we consider explicit Incremental Sort? (Tomas Vondra <tomas@vondra.me>)
Responses	Re: Why don't we consider explicit Incremental Sort?
List	pgsql-hackers

Tree view

On Fri, Sep 13, 2024 at 7:51 PM Tomas Vondra <tomas@vondra.me> wrote:
> Sure, an incremental sort can improve things if things go well, no doubt
> about that. But how significant can the improvement be, especially if we
> need to sort everything? In your example it's ~15%, which is nice, and
> maybe the benefits can be much larger if the incremental sort can do
> everything in memory, without flushing to disk.
>
> But what about the risks? What will happen if the groups are not this
> uniformly and independently sized, and stuff like that? Maybe it'll be
> costed well enough, I haven't tried.

I understand the concern and agree that there is a risk of regression
if there is a skew in the number of rows per pre-sorted group.

Actually there were discussions about this during the work on commit
4a29eabd1.  Please see [1].  I will repeat David's demonstration and
rerun his query on the current master branch to see what happens.

create table ab (a int not null, b int not null);
insert into ab select 0,x from generate_Series(1,999000)x union all
select x%1000+1,0 from generate_Series(999001,1000000)x;

create index on ab (a);

analyze ab;

So this is a table with a very large skew: the 0 group has 999000
rows, and the remaining groups 1-1000 have just 1 row each.

-- on master
explain (analyze, timing off) select * from ab order by a,b;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Incremental Sort  (cost=2767.38..109344.55 rows=1000000 width=8)
(actual rows=1000000 loops=1)
   Sort Key: a, b
   Presorted Key: a
   Full-sort Groups: 33  Sort Method: quicksort  Average Memory: 26kB
Peak Memory: 26kB
   Pre-sorted Groups: 1  Sort Method: external merge  Average Disk:
17680kB  Peak Disk: 17680kB
   ->  Index Scan using ab_a_idx on ab  (cost=0.42..22832.42
rows=1000000 width=8) (actual rows=1000000 loops=1)
 Planning Time: 0.829 ms
 Execution Time: 1028.892 ms
(8 rows)

-- manually disable incremental sort
set enable_incremental_sort to off;
explain (analyze, timing off) select * from ab order by a,b;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Sort  (cost=127757.34..130257.34 rows=1000000 width=8) (actual
rows=1000000 loops=1)
   Sort Key: a, b
   Sort Method: external merge  Disk: 17704kB
   ->  Seq Scan on ab  (cost=0.00..14425.00 rows=1000000 width=8)
(actual rows=1000000 loops=1)
 Planning Time: 0.814 ms
 Execution Time: 765.198 ms
(6 rows)

Look, regression happens on current master!

This is a question I’ve often pondered: each time we introduce a new
optimization, there are always potential cases where it could lead to
regressions.  What should we do about this?  I kind of agree on
David's option that, as in the commit message of 4a29eabd1:

"
That, of course, as with teaching the planner any new tricks,
means an increased likelihood that the planner will perform an incremental
sort when it's not the best method.  Our standard escape hatch for these
cases is an enable_* GUC.
"

I know ideally we should not rely on these enable_* GUCs, but in
reality it seems that sometimes we do not have a better solution.

[1] https://postgr.es/m/CAApHDvr1Sm+g9hbv4REOVuvQKeDWXcKUAhmbK5K+dfun0s9CvA@mail.gmail.com

Thanks
Richard

pgsql-hackers by date:

From: Richard Guo
Date: 14 September, 09:37:21
Subject: Re: Why don't we consider explicit Incremental Sort?

From: Tom Lane
Date: 14 September, 10:39:04
Subject: Re: Obsolete comment in pg_stat_statements

Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers

Previous

Next