Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Why don't we consider explicit Incremental Sort?
Date
Msg-id CAMbWs4-3Z-pcHCWq3gBVHp=43kqaF3Quhwe5cUg-x9O54Lv2CA@mail.gmail.com
Whole thread Raw
In response to Re: Why don't we consider explicit Incremental Sort?  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
On Mon, Sep 23, 2024 at 10:01 PM Tomas Vondra <tomas@vondra.me> wrote:
> You don't need any skew. Consider this perfectly uniform dataset:
>
> create table t (a int, b int);
> insert into t select 100000 * random(), 100 * random()
>   from generate_series(1,1000000) s(i);
> create index on t (a);
> vacuum analyze;
> checkpoint;
>
> explain analyze select * from t order by a, b;

I think if we want to compare the performance of incremental sort vs.
full sort on this dataset, we need to ensure that other variables are
kept constant, ie we need to ensure that both methods use the same
subpath, whether it's Index Scan or Seq Scan.

This is especially true in the scenario addressed by this patch, as
for a mergejoin, we only consider outerrel's cheapest_total_path when
we assume that an explicit sort atop is needed.  That is to say, the
subpath has already been chosen when it comes to determine whether to
use incremental sort or full sort.

According to this theory, here is what I got on this same dataset:

-- incremental sort
explain analyze select * from t order by a, b;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Incremental Sort  (cost=1.02..68838.65 rows=1000000 width=8) (actual
time=1.292..1564.265 rows=1000000 loops=1)
   Sort Key: a, b
   Presorted Key: a
   Full-sort Groups: 27022  Sort Method: quicksort  Average Memory:
25kB  Peak Memory: 25kB
   ->  Index Scan using t_a_idx on t  (cost=0.42..37244.20
rows=1000000 width=8) (actual time=0.408..1018.785 rows=1000000
loops=1)
 Planning Time: 0.998 ms
 Execution Time: 1602.861 ms
(7 rows)


-- full sort
set enable_incremental_sort to off;
set enable_seqscan to off;
explain analyze select * from t order by a, b;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=150576.54..153076.54 rows=1000000 width=8) (actual
time=1760.090..1927.598 rows=1000000 loops=1)
   Sort Key: a, b
   Sort Method: external merge  Disk: 17704kB
   ->  Index Scan using t_a_idx on t  (cost=0.42..37244.20
rows=1000000 width=8) (actual time=0.531..1010.931 rows=1000000
loops=1)
 Planning Time: 0.980 ms
 Execution Time: 1970.287 ms
(6 rows)

So incremental sort outperforms full sort on this dataset.

Thanks
Richard



pgsql-hackers by date:

Previous
From: Shayon Mukherjee
Date:
Subject: Re: Proposal to Enable/Disable Index using ALTER INDEX
Next
From: jian he
Date:
Subject: Re: not null constraints, again