Thread: Why don't we consider explicit Incremental Sort?

Why don't we consider explicit Incremental Sort?

From

Richard Guo

Date:

09 September 2024, 12:39:24

While looking at label_sort_with_costsize() due to another issue, I
noticed that we build explicit Sort nodes but not explicit Incremental
Sort nodes.  I wonder why this is the case.  It seems to me that
Incremental Sorts are preferable in some cases.  For example:

create table t (a int, b int);
create index on t (a);

set enable_seqscan to off;

explain (costs off)
select * from t t1 join t t2 on t1.a = t2.a and t1.b = t2.b;
                   QUERY PLAN
-------------------------------------------------
 Merge Join
   Merge Cond: ((t1.a = t2.a) AND (t1.b = t2.b))
   ->  Sort
         Sort Key: t1.a, t1.b
         ->  Index Scan using t_a_idx on t t1
   ->  Sort
         Sort Key: t2.a, t2.b
         ->  Index Scan using t_a_idx on t t2
(8 rows)

For the outer path of a mergejoin, I think we can leverage Incremental
Sort to save effort.  For the inner path, though, we cannot do this
because Incremental Sort does not support mark/restore.

It could be argued that what if a mergejoin with an Incremental Sort
as the outer path is selected as the inner path of another mergejoin?
Well, I don't think this is a problem, because mergejoin itself does
not support mark/restore either, and we will add a Material node on
top of it anyway in this case (see final_cost_mergejoin).

label_sort_with_costsize is also called in create_append_plan,
create_merge_append_plan and create_unique_plan.  In these cases, we
may also consider using Incremental Sort.  But I haven't looked into
these cases.

Any thoughts?

Thanks
Richard

Re: Why don't we consider explicit Incremental Sort?

From

Tomas Vondra

Date:

09 September 2024, 13:22:50

Hi,

On 9/9/24 11:39, Richard Guo wrote:
> While looking at label_sort_with_costsize() due to another issue, I
> noticed that we build explicit Sort nodes but not explicit Incremental
> Sort nodes.  I wonder why this is the case.  It seems to me that
> Incremental Sorts are preferable in some cases.  For example:
> 

I think we intentionally added incremental sort ... incrementally ;-)

I don't recall the reasoning exactly, but I think we realized the
incremental sort costing can be a bit shaky (and AFAIK we saw a couple
cases of that reported). So we added it to places where the reasoning
was it wouldn't be a problem and the incremental sort would be a clear
win, e.g. thanks to the "cheap startup" thing.

> create table t (a int, b int);
> create index on t (a);
> 
> set enable_seqscan to off;
> 
> explain (costs off)
> select * from t t1 join t t2 on t1.a = t2.a and t1.b = t2.b;
>                    QUERY PLAN
> -------------------------------------------------
>  Merge Join
>    Merge Cond: ((t1.a = t2.a) AND (t1.b = t2.b))
>    ->  Sort
>          Sort Key: t1.a, t1.b
>          ->  Index Scan using t_a_idx on t t1
>    ->  Sort
>          Sort Key: t2.a, t2.b
>          ->  Index Scan using t_a_idx on t t2
> (8 rows)
> 
> For the outer path of a mergejoin, I think we can leverage Incremental
> Sort to save effort.  For the inner path, though, we cannot do this
> because Incremental Sort does not support mark/restore.
> 
> It could be argued that what if a mergejoin with an Incremental Sort
> as the outer path is selected as the inner path of another mergejoin?
> Well, I don't think this is a problem, because mergejoin itself does
> not support mark/restore either, and we will add a Material node on
> top of it anyway in this case (see final_cost_mergejoin).
> 
> label_sort_with_costsize is also called in create_append_plan,
> create_merge_append_plan and create_unique_plan.  In these cases, we
> may also consider using Incremental Sort.  But I haven't looked into
> these cases.
> 
> Any thoughts?
> 

I think one challenge with this case is that create_mergejoin_plan
creates these Sort plans explicitly - it's not "pathified" so it doesn't
go through the usual cost comparison etc. And it certainly is not the
case that incremental sort would win always, so we'd need to replicate
the cost comparison logic too.

I have not thought about this particular case too much, but how likely
is it that mergejoin will win for this plan in practice? If we consider
a query that only needs a fraction of rows (say, thanks to LIMIT),
aren't we likely to pick a nested loop (which can do incremental sort
for the outer plan)? For joins that need to run to completion it might
be a win, but there's also the higher risk of a poor costing.

I'm not saying it's not worth exploring, I'm just trying recall reasons
why it might not work. Also I don't think it's fundamentally impossible
to do mark/restore for incremental sort - I just haven't tried doing it
because it wasn't necessary. In the worst case we could simply add a
Materialize node on top, no?

regards

-- 
Tomas Vondra

Re: Why don't we consider explicit Incremental Sort?

From

Richard Guo

Date:

13 September 2024, 14:18:55

On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas@vondra.me> wrote:
> I have not thought about this particular case too much, but how likely
> is it that mergejoin will win for this plan in practice? If we consider
> a query that only needs a fraction of rows (say, thanks to LIMIT),
> aren't we likely to pick a nested loop (which can do incremental sort
> for the outer plan)? For joins that need to run to completion it might
> be a win, but there's also the higher risk of a poor costing.

I think one situation where mergejoin tends to outperform hashjoin and
nestloop is when ORDER BY clauses are present.  For example, for the
query below, mergejoin runs much faster than hashjoin and nestloop,
and mergejoin with incremental sort is even faster than mergejoin with
full sort.

create table t (a int, b int);
insert into t select random(1,100), random(1,100) from
generate_series(1,100000);

analyze t;

-- on patched
explain (analyze, costs off, timing off)
select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a
and t1.b = t2.b order by t1.a, t1.b;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Merge Join (actual rows=1100372 loops=1)
   Merge Cond: ((t.a = t2.a) AND (t.b = t2.b))
   ->  Incremental Sort (actual rows=100000 loops=1)
         Sort Key: t.a, t.b
         Presorted Key: t.a
         Full-sort Groups: 100  Sort Method: quicksort  Average
Memory: 26kB  Peak Memory: 26kB
         Pre-sorted Groups: 100  Sort Method: quicksort  Average
Memory: 74kB  Peak Memory: 74kB
         ->  Sort (actual rows=100000 loops=1)
               Sort Key: t.a
               Sort Method: external merge  Disk: 1768kB
               ->  Seq Scan on t (actual rows=100000 loops=1)
   ->  Sort (actual rows=1100367 loops=1)
         Sort Key: t2.a, t2.b
         Sort Method: external sort  Disk: 2160kB
         ->  Seq Scan on t t2 (actual rows=100000 loops=1)
 Planning Time: 0.912 ms
 Execution Time: 854.502 ms
(17 rows)

-- disable incremental sort
set enable_incremental_sort to off;

explain (analyze, costs off, timing off)
select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a
and t1.b = t2.b order by t1.a, t1.b;
                          QUERY PLAN
--------------------------------------------------------------
 Merge Join (actual rows=1100372 loops=1)
   Merge Cond: ((t.a = t2.a) AND (t.b = t2.b))
   ->  Sort (actual rows=100000 loops=1)
         Sort Key: t.a, t.b
         Sort Method: external merge  Disk: 1768kB
         ->  Sort (actual rows=100000 loops=1)
               Sort Key: t.a
               Sort Method: external merge  Disk: 1768kB
               ->  Seq Scan on t (actual rows=100000 loops=1)
   ->  Sort (actual rows=1100367 loops=1)
         Sort Key: t2.a, t2.b
         Sort Method: external sort  Disk: 2160kB
         ->  Seq Scan on t t2 (actual rows=100000 loops=1)
 Planning Time: 1.451 ms
 Execution Time: 958.607 ms
(15 rows)


-- with hashjoin
set enable_mergejoin to off;

explain (analyze, costs off, timing off)
select * from (select * from t order by a) t1 join t t2 on t1.a = t2.a
and t1.b = t2.b order by t1.a, t1.b;
                             QUERY PLAN
--------------------------------------------------------------------
 Sort (actual rows=1100372 loops=1)
   Sort Key: t.a, t.b
   Sort Method: external merge  Disk: 28000kB
   ->  Hash Join (actual rows=1100372 loops=1)
         Hash Cond: ((t2.a = t.a) AND (t2.b = t.b))
         ->  Seq Scan on t t2 (actual rows=100000 loops=1)
         ->  Hash (actual rows=100000 loops=1)
               Buckets: 131072  Batches: 1  Memory Usage: 4931kB
               ->  Sort (actual rows=100000 loops=1)
                     Sort Key: t.a
                     Sort Method: external merge  Disk: 1768kB
                     ->  Seq Scan on t (actual rows=100000 loops=1)
 Planning Time: 1.998 ms
 Execution Time: 2469.954 ms
(14 rows)

-- with nestloop, my small machine seems runs forever.

Thanks
Richard

Re: Why don't we consider explicit Incremental Sort?

From

Tomas Vondra

Date:

13 September 2024, 14:35:35


On 9/13/24 06:04, Richard Guo wrote:
> On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas@vondra.me> wrote:
>> I think we intentionally added incremental sort ... incrementally ;-)
> 
> Haha, right.
> 
>> I think one challenge with this case is that create_mergejoin_plan
>> creates these Sort plans explicitly - it's not "pathified" so it doesn't
>> go through the usual cost comparison etc. And it certainly is not the
>> case that incremental sort would win always, so we'd need to replicate
>> the cost comparison logic too.
>>
>> I have not thought about this particular case too much, but how likely
>> is it that mergejoin will win for this plan in practice? If we consider
>> a query that only needs a fraction of rows (say, thanks to LIMIT),
>> aren't we likely to pick a nested loop (which can do incremental sort
>> for the outer plan)? For joins that need to run to completion it might
>> be a win, but there's also the higher risk of a poor costing.
> 
> Yeah, currently mergejoin path always assumes that full sort would be
> used on top of the outer path and inner path if they are not already
> ordered appropriately.  So initial_cost_mergejoin directly charges the
> cost of full sort into outer/inner path's cost, without going through
> the usual cost comparison with incremental sort.
> 
> It seems to me that some parts of our code assume that, for a given
> input path that is partially ordered, an incremental sort is always
> preferable to a full sort (see commit 4a29eabd1).  Am I getting this
> correctly?

I don't think we're making such assumption. I don't know which of the
many places modified by 4a29eabd1 you have in mind, but IIRC we always
consider both full and incremental sort.

>  If this is the case, then I think using the following
> outer path for the merge join:
> 
>    ->  Incremental Sort
>          Sort Key: t1.a, t1.b
>          Presorted Key: t1.a
>          ->  Index Scan using t1_a_idx on t1
> 
> ... would be an immediate improvement over the current path, which is:
> 
>    ->  Sort
>          Sort Key: t1.a, t1.b
>          ->  Index Scan using t1_a_idx on t1
> 
> 
> The shaky cost estimates for incremental sort that you mentioned are
> indeed a concern.  Fortunately we have the enable_incremental_sort GUC
> already.  As in may other parts of the code (such as in the function
> add_paths_to_grouping_rel), we can always disable incremental sort to
> fall back to a full sort if needed.
> 

I think our goal should be to not rely on the enable_incremental_sort
GUC as a defense very much. It's a very blunt instrument, that often
forces users to pick whether they prefer to improve one query while
harming some other queries. I personally consider these enable_ GUC more
a development tool than something suitable for users.

>> I'm not saying it's not worth exploring, I'm just trying recall reasons
>> why it might not work. Also I don't think it's fundamentally impossible
>> to do mark/restore for incremental sort - I just haven't tried doing it
>> because it wasn't necessary. In the worst case we could simply add a
>> Materialize node on top, no?
> 
> Yeah, perhaps we could support mark/restore for incremental sort
> someday.  This would allow us to apply incremental sort to the inner
> path of a merge join too.  But if we apply a Material node on top of
> the inner due to the lack of mark/restore support for incremental
> sort, then we will need to compare two paths:
> 
> partially sorted path -> incremental sort -> material
> 
> VS.
> 
> partially sorted path -> full sort
> 
> I think it's hard to tell which is cheaper without a cost comparison,
> which we do not have for now.
> 

How is this different from the incremental sort costing we already have?

> 
> To help with the discussion, I drafted a WIP patch as attached, which
> chooses an incremental sort over a full sort on the given outer path
> of a mergejoin whenever possible.  There is one ensuing plan diff in
> the regression tests (not included in the patch).  It seems that some
> query in the tests begins to use incremental sort for mergejoin.
> 

I'm not against the patch in principle, but I haven't thought about the
costing and risk very much. If I have time I'll try to do some more
experiments to see how it behaves, but no promises.


regards

-- 
Tomas Vondra

Re: Why don't we consider explicit Incremental Sort?

From

Tomas Vondra

Date:

13 September 2024, 14:51:55

On 9/13/24 13:18, Richard Guo wrote:
> On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas@vondra.me> wrote:
>> I have not thought about this particular case too much, but how likely
>> is it that mergejoin will win for this plan in practice? If we consider
>> a query that only needs a fraction of rows (say, thanks to LIMIT),
>> aren't we likely to pick a nested loop (which can do incremental sort
>> for the outer plan)? For joins that need to run to completion it might
>> be a win, but there's also the higher risk of a poor costing.
> 
> I think one situation where mergejoin tends to outperform hashjoin and
> nestloop is when ORDER BY clauses are present.  For example, for the
> query below, mergejoin runs much faster than hashjoin and nestloop,
> and mergejoin with incremental sort is even faster than mergejoin with
> full sort.
> 

Sure, an incremental sort can improve things if things go well, no doubt
about that. But how significant can the improvement be, especially if we
need to sort everything? In your example it's ~15%, which is nice, and
maybe the benefits can be much larger if the incremental sort can do
everything in memory, without flushing to disk.

But what about the risks? What will happen if the groups are not this
uniformly and independently sized, and stuff like that? Maybe it'll be
costed well enough, I haven't tried.

I don't recall the exact reasoning for why we didn't add incremental
sort in various places, I'd have to dig into the old threads, or
something. But I believe thinking about these risks was part of it -
trying to handle places where the potential benefits are much larger
than the risks.

As I wrote earlier, it's not my intent to discourage you from working on
this. Please do, I'm sure it can be improved.

regards

-- 
Tomas Vondra

Re: Why don't we consider explicit Incremental Sort?

From

Richard Guo

Date:

14 September 2024, 06:37:21

On Fri, Sep 13, 2024 at 7:35 PM Tomas Vondra <tomas@vondra.me> wrote:
> On 9/13/24 06:04, Richard Guo wrote:
> > It seems to me that some parts of our code assume that, for a given
> > input path that is partially ordered, an incremental sort is always
> > preferable to a full sort (see commit 4a29eabd1).  Am I getting this
> > correctly?
>
> I don't think we're making such assumption. I don't know which of the
> many places modified by 4a29eabd1 you have in mind, but IIRC we always
> consider both full and incremental sort.

Hmm, I don't think it's true that we always consider both full and
incremental sort on the same path.  It was true initially, but that’s
no longer the case.

I checked the 9 callers of create_incremental_sort_path, and they all
follow the logic that if there are presorted keys, only incremental
sort is considered.  To quote a comment from one of the callers:

    * ... We've no need to consider both
    * sort and incremental sort on the same path.  We assume that
    * incremental sort is always faster when there are presorted
    * keys.

I think this is also explained in the commit message of 4a29eabd1,
quoted here:

"
Previously we would generally create a sort path on the cheapest input
path (if that wasn't sorted already) and incremental sort paths on any
path which had presorted keys.  This meant that if the cheapest input path
wasn't completely sorted but happened to have presorted keys, we would
create a full sort path *and* an incremental sort path on that input path.
Here we change this logic so that if there are presorted keys, we only
create an incremental sort path, and create sort paths only when a full
sort is required.
"

Thanks
Richard

Re: Why don't we consider explicit Incremental Sort?

From

Richard Guo

Date:

14 September 2024, 06:50:30

On Fri, Sep 13, 2024 at 7:51 PM Tomas Vondra <tomas@vondra.me> wrote:
> Sure, an incremental sort can improve things if things go well, no doubt
> about that. But how significant can the improvement be, especially if we
> need to sort everything? In your example it's ~15%, which is nice, and
> maybe the benefits can be much larger if the incremental sort can do
> everything in memory, without flushing to disk.
>
> But what about the risks? What will happen if the groups are not this
> uniformly and independently sized, and stuff like that? Maybe it'll be
> costed well enough, I haven't tried.

I understand the concern and agree that there is a risk of regression
if there is a skew in the number of rows per pre-sorted group.

Actually there were discussions about this during the work on commit
4a29eabd1.  Please see [1].  I will repeat David's demonstration and
rerun his query on the current master branch to see what happens.

create table ab (a int not null, b int not null);
insert into ab select 0,x from generate_Series(1,999000)x union all
select x%1000+1,0 from generate_Series(999001,1000000)x;

create index on ab (a);

analyze ab;

So this is a table with a very large skew: the 0 group has 999000
rows, and the remaining groups 1-1000 have just 1 row each.

-- on master
explain (analyze, timing off) select * from ab order by a,b;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Incremental Sort  (cost=2767.38..109344.55 rows=1000000 width=8)
(actual rows=1000000 loops=1)
   Sort Key: a, b
   Presorted Key: a
   Full-sort Groups: 33  Sort Method: quicksort  Average Memory: 26kB
Peak Memory: 26kB
   Pre-sorted Groups: 1  Sort Method: external merge  Average Disk:
17680kB  Peak Disk: 17680kB
   ->  Index Scan using ab_a_idx on ab  (cost=0.42..22832.42
rows=1000000 width=8) (actual rows=1000000 loops=1)
 Planning Time: 0.829 ms
 Execution Time: 1028.892 ms
(8 rows)

-- manually disable incremental sort
set enable_incremental_sort to off;
explain (analyze, timing off) select * from ab order by a,b;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Sort  (cost=127757.34..130257.34 rows=1000000 width=8) (actual
rows=1000000 loops=1)
   Sort Key: a, b
   Sort Method: external merge  Disk: 17704kB
   ->  Seq Scan on ab  (cost=0.00..14425.00 rows=1000000 width=8)
(actual rows=1000000 loops=1)
 Planning Time: 0.814 ms
 Execution Time: 765.198 ms
(6 rows)

Look, regression happens on current master!

This is a question I’ve often pondered: each time we introduce a new
optimization, there are always potential cases where it could lead to
regressions.  What should we do about this?  I kind of agree on
David's option that, as in the commit message of 4a29eabd1:

"
That, of course, as with teaching the planner any new tricks,
means an increased likelihood that the planner will perform an incremental
sort when it's not the best method.  Our standard escape hatch for these
cases is an enable_* GUC.
"

I know ideally we should not rely on these enable_* GUCs, but in
reality it seems that sometimes we do not have a better solution.

[1] https://postgr.es/m/CAApHDvr1Sm+g9hbv4REOVuvQKeDWXcKUAhmbK5K+dfun0s9CvA@mail.gmail.com

Thanks
Richard

Re: Why don't we consider explicit Incremental Sort?

From

Tomas Vondra

Date:

15 September 2024, 00:50:09


On 9/14/24 05:50, Richard Guo wrote:
> On Fri, Sep 13, 2024 at 7:51 PM Tomas Vondra <tomas@vondra.me> wrote:
>> Sure, an incremental sort can improve things if things go well, no doubt
>> about that. But how significant can the improvement be, especially if we
>> need to sort everything? In your example it's ~15%, which is nice, and
>> maybe the benefits can be much larger if the incremental sort can do
>> everything in memory, without flushing to disk.
>>
>> But what about the risks? What will happen if the groups are not this
>> uniformly and independently sized, and stuff like that? Maybe it'll be
>> costed well enough, I haven't tried.
> 
> I understand the concern and agree that there is a risk of regression
> if there is a skew in the number of rows per pre-sorted group.
> 
> Actually there were discussions about this during the work on commit
> 4a29eabd1.  Please see [1].  I will repeat David's demonstration and
> rerun his query on the current master branch to see what happens.
> 
> create table ab (a int not null, b int not null);
> insert into ab select 0,x from generate_Series(1,999000)x union all
> select x%1000+1,0 from generate_Series(999001,1000000)x;
> 
> create index on ab (a);
> 
> analyze ab;
> 
> So this is a table with a very large skew: the 0 group has 999000
> rows, and the remaining groups 1-1000 have just 1 row each.
> 
> -- on master
> explain (analyze, timing off) select * from ab order by a,b;
>                                                    QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
>  Incremental Sort  (cost=2767.38..109344.55 rows=1000000 width=8)
> (actual rows=1000000 loops=1)
>    Sort Key: a, b
>    Presorted Key: a
>    Full-sort Groups: 33  Sort Method: quicksort  Average Memory: 26kB
> Peak Memory: 26kB
>    Pre-sorted Groups: 1  Sort Method: external merge  Average Disk:
> 17680kB  Peak Disk: 17680kB
>    ->  Index Scan using ab_a_idx on ab  (cost=0.42..22832.42
> rows=1000000 width=8) (actual rows=1000000 loops=1)
>  Planning Time: 0.829 ms
>  Execution Time: 1028.892 ms
> (8 rows)
> 
> -- manually disable incremental sort
> set enable_incremental_sort to off;
> explain (analyze, timing off) select * from ab order by a,b;
>                                            QUERY PLAN
> ------------------------------------------------------------------------------------------------
>  Sort  (cost=127757.34..130257.34 rows=1000000 width=8) (actual
> rows=1000000 loops=1)
>    Sort Key: a, b
>    Sort Method: external merge  Disk: 17704kB
>    ->  Seq Scan on ab  (cost=0.00..14425.00 rows=1000000 width=8)
> (actual rows=1000000 loops=1)
>  Planning Time: 0.814 ms
>  Execution Time: 765.198 ms
> (6 rows)
> 
> Look, regression happens on current master!
> > This is a question I’ve often pondered: each time we introduce a new
> optimization, there are always potential cases where it could lead to
> regressions.  What should we do about this?  I kind of agree on
> David's option that, as in the commit message of 4a29eabd1:
> 

Right, or as Goetz Graefe said "choice is confusion."

The funny thing is it also matters when the alternative plans are
introduced. Had it been at the very beginning, we'd consider the
behavior (including choosing sub-optimal plans) the baseline, and it'd
be okay-ish. But when we're introducing those alternative paths later,
it's more likely to be seen as a "regression" ...

> "
> That, of course, as with teaching the planner any new tricks,
> means an increased likelihood that the planner will perform an incremental
> sort when it's not the best method.  Our standard escape hatch for these
> cases is an enable_* GUC.
> "
> 
> I know ideally we should not rely on these enable_* GUCs, but in
> reality it seems that sometimes we do not have a better solution.
> 

Right, the basic truth is there's no way to teach the optimizer to do
new stuff without a risk of regression. We simply can't do perfect
decisions based on incomplete information (which is what the statistics
are, intentionally). It is up to us to reason about the risks, and
ideally deal with that later at execution time.

I still don't think we should rely on GUCs too much for this.


regards

-- 
Tomas Vondra

Re: Why don't we consider explicit Incremental Sort?

From

Tomas Vondra

Date:

15 September 2024, 01:01:55

On 9/14/24 05:37, Richard Guo wrote:
> On Fri, Sep 13, 2024 at 7:35 PM Tomas Vondra <tomas@vondra.me> wrote:
>> On 9/13/24 06:04, Richard Guo wrote:
>>> It seems to me that some parts of our code assume that, for a given
>>> input path that is partially ordered, an incremental sort is always
>>> preferable to a full sort (see commit 4a29eabd1).  Am I getting this
>>> correctly?
>>
>> I don't think we're making such assumption. I don't know which of the
>> many places modified by 4a29eabd1 you have in mind, but IIRC we always
>> consider both full and incremental sort.
> 
> Hmm, I don't think it's true that we always consider both full and
> incremental sort on the same path.  It was true initially, but that’s
> no longer the case.
> 
> I checked the 9 callers of create_incremental_sort_path, and they all
> follow the logic that if there are presorted keys, only incremental
> sort is considered.  To quote a comment from one of the callers:
> 
>     * ... We've no need to consider both
>     * sort and incremental sort on the same path.  We assume that
>     * incremental sort is always faster when there are presorted
>     * keys.
> 
> I think this is also explained in the commit message of 4a29eabd1,
> quoted here:
> 
> "
> Previously we would generally create a sort path on the cheapest input
> path (if that wasn't sorted already) and incremental sort paths on any
> path which had presorted keys.  This meant that if the cheapest input path
> wasn't completely sorted but happened to have presorted keys, we would
> create a full sort path *and* an incremental sort path on that input path.
> Here we change this logic so that if there are presorted keys, we only
> create an incremental sort path, and create sort paths only when a full
> sort is required.
> "
> 

Hmmm ... I wasn't involved in that discussion and I haven't studied the
thread now, but this seems a bit weird to me. If the presorted keys have
low cardinality, can't the full sort be faster?

Maybe it's not possible with the costing changes (removing the
penalization), in which case it would be fine to not consider the full
sort, because we'd just throw it away immediately. But if the full sort
could be costed as cheaper, I'd say that might be wrong.


regards

-- 
Tomas Vondra

Re: Why don't we consider explicit Incremental Sort?

From

David Rowley

Date:

22 September 2024, 08:38:23

On Fri, 20 Sept 2024 at 20:48, Richard Guo <guofenglinux@gmail.com> wrote:
> I agree with you that sometimes the definition of 'regression' can
> depend on when the alternative plans are introduced.  Imagine if we
> initially did not have the 1.5x pessimism factor and introduced it
> later, it would be treated as a 'regression' because some queries that
> could benefit from incremental sort would then have to resort to full
> sort.

I think this is a good way of looking at it. I think it's a bad idea
when introducing new planner/executor abilities to penalise the costs
for that ability.  It might make the committer sleep better at night
after committing some new feature, but it's just not a future-proof
endeavour. We should aim for our cost models to be as close to the
truth as possible. As soon as you start introducing "just in case"
penalties, you're telling lies. Lies don't work out well, especially
when compounded with other lies, which is exactly what you get when
layering Paths atop of Paths with just-in-case penalties added at each
level.

I was in this position with Memoize. The problem there is that when we
don't have any stats, we assume the n_distinct is 200. Memoize can
look quite attractive with such a small n_distinct estimate. I
invented SELFLAG_USED_DEFAULT to allow Memoize to steer clear when the
n_distinct estimate used the hard-coded default. I think that's an ok
thing to do as otherwise it could work out very badly if Nested Loop
-> Memoize was used instead of, say a Hash Join on a join problem with
millions of rows, all of them with distinct join values.

> So here is the v2 patch, which is almost the same as v1, but includes
> changes in test cases and comments, and also includes a commit
> message.  I'll put it in the commitfest.

Just looking at the commit message:

> The rationale is based on the assumption that incremental sort is
> always faster than full sort when there are presorted keys, a premise
> that has been applied in various parts of the code.  This assumption
> does not always hold, particularly in cases with a large skew in the
> number of rows within the presorted groups.

My understanding is that the worst case for incremental sort is the
same as sort, i.e. only 1 presorted group, which is the same effort to
sort. Is there something I'm missing?

David

Re: Why don't we consider explicit Incremental Sort?

From

Richard Guo

Date:

23 September 2024, 06:03:37

On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
> Just looking at the commit message:
>
> > The rationale is based on the assumption that incremental sort is
> > always faster than full sort when there are presorted keys, a premise
> > that has been applied in various parts of the code.  This assumption
> > does not always hold, particularly in cases with a large skew in the
> > number of rows within the presorted groups.
>
> My understanding is that the worst case for incremental sort is the
> same as sort, i.e. only 1 presorted group, which is the same effort to
> sort. Is there something I'm missing?

I was thinking that when there’s a large skew in the number of rows
per pre-sorted group, incremental sort might underperform full sort.
As mentioned in the comments for cost_incremental_sort, it has to
detect the sort groups, plus it needs to perform tuplesort_reset after
each group.  However, I doubt these factors would have a substantial
impact on the performance of incremental sort.  So maybe in the worst
skewed groups case, incremental sort may still perform similarly to
full sort.

Another less obvious reason is that in cases of skewed groups, we may
significantly underestimate the cost of incremental sort.  This could
result in choosing a more expensive subpath under the sort.  Such as
the example in [1], we end up with IndexScan->IncrementalSort rather
than SeqScan->FullSort, while the former plan is more expensive to
execute.  However, this point does not affect this patch, because for
a mergejoin, we only consider outerrel's cheapest-total-cost when we
assume that an explicit sort atop is needed, i.e., we do not have a
chance to select which subpath to use in this case.

[1] https://postgr.es/m/CAMbWs49+CXsrgbq0LD1at-5jR=AHHN0YtDy9YvgXAsMfndZe-w@mail.gmail.com

Thanks
Richard

Re: Why don't we consider explicit Incremental Sort?

From

Tomas Vondra

Date:

23 September 2024, 17:01:42

On 9/23/24 05:03, Richard Guo wrote:
> On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> Just looking at the commit message:
>>
>>> The rationale is based on the assumption that incremental sort is
>>> always faster than full sort when there are presorted keys, a premise
>>> that has been applied in various parts of the code.  This assumption
>>> does not always hold, particularly in cases with a large skew in the
>>> number of rows within the presorted groups.
>>
>> My understanding is that the worst case for incremental sort is the
>> same as sort, i.e. only 1 presorted group, which is the same effort to
>> sort. Is there something I'm missing?
> 
> I was thinking that when there’s a large skew in the number of rows
> per pre-sorted group, incremental sort might underperform full sort.
> As mentioned in the comments for cost_incremental_sort, it has to
> detect the sort groups, plus it needs to perform tuplesort_reset after
> each group.  However, I doubt these factors would have a substantial
> impact on the performance of incremental sort.  So maybe in the worst
> skewed groups case, incremental sort may still perform similarly to
> full sort.
> 
> Another less obvious reason is that in cases of skewed groups, we may
> significantly underestimate the cost of incremental sort.  This could
> result in choosing a more expensive subpath under the sort.  Such as
> the example in [1], we end up with IndexScan->IncrementalSort rather
> than SeqScan->FullSort, while the former plan is more expensive to
> execute.  However, this point does not affect this patch, because for
> a mergejoin, we only consider outerrel's cheapest-total-cost when we
> assume that an explicit sort atop is needed, i.e., we do not have a
> chance to select which subpath to use in this case.
> 
> [1] https://postgr.es/m/CAMbWs49+CXsrgbq0LD1at-5jR=AHHN0YtDy9YvgXAsMfndZe-w@mail.gmail.com
> 

You don't need any skew. Consider this perfectly uniform dataset:

create table t (a int, b int);
insert into t select 100000 * random(), 100 * random()
  from generate_series(1,1000000) s(i);
create index on t (a);
vacuum analyze;
checkpoint;

explain analyze select * from t order by a, b;

                            QUERY PLAN
-----------------------------------------------------------------
 Sort  (cost=127757.34..130257.34 rows=1000000 width=8)
       (actual time=186.288..275.813 rows=1000000 loops=1)
   Sort Key: a, b
   Sort Method: external merge  Disk: 17704kB
   ->  Seq Scan on t  (cost=0.00..14425.00 rows=1000000 width=8)
                (actual time=0.005..35.989 rows=1000000 loops=1)
 Planning Time: 0.075 ms
 Execution Time: 301.143 ms
(6 rows)


set enable_incremental_sort = on;
explain analyze select * from t order by a, b;

                            QUERY PLAN
-----------------------------------------------------------------
 Incremental Sort  (cost=1.03..68908.13 rows=1000000 width=8)
            (actual time=0.102..497.143 rows=1000000 loops=1)
   Sort Key: a, b
   Presorted Key: a
   Full-sort Groups: 27039  Sort Method: quicksort
     Average Memory: 25kB  Peak Memory: 25kB
   ->  Index Scan using t_a_idx on t
                   (cost=0.42..37244.25 rows=1000000 width=8)
            (actual time=0.050..376.403 rows=1000000 loops=1)
 Planning Time: 0.057 ms
 Execution Time: 519.301 ms
(7 rows)

Sure, the table is very small, but the example is not crazy. In fact,
this is the "nicest" example for estimation - it's perfectly random, no
correlations, no skew.


regards

-- 
Tomas Vondra

Re: Why don't we consider explicit Incremental Sort?

From

David Rowley

Date:

24 September 2024, 01:21:58

On Tue, 24 Sept 2024 at 02:01, Tomas Vondra <tomas@vondra.me> wrote:
>
> On 9/23/24 05:03, Richard Guo wrote:
> > On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
> >> Just looking at the commit message:
> >>
> >>> The rationale is based on the assumption that incremental sort is
> >>> always faster than full sort when there are presorted keys, a premise
> >>> that has been applied in various parts of the code.  This assumption
> >>> does not always hold, particularly in cases with a large skew in the
> >>> number of rows within the presorted groups.
> >>
> >> My understanding is that the worst case for incremental sort is the
> >> same as sort, i.e. only 1 presorted group, which is the same effort to
> >> sort. Is there something I'm missing?
> >
> > I was thinking that when there’s a large skew in the number of rows
> > per pre-sorted group, incremental sort might underperform full sort.
> > As mentioned in the comments for cost_incremental_sort, it has to
> > detect the sort groups, plus it needs to perform tuplesort_reset after
> > each group.  However, I doubt these factors would have a substantial
> > impact on the performance of incremental sort.  So maybe in the worst
> > skewed groups case, incremental sort may still perform similarly to
> > full sort.
> >
> > Another less obvious reason is that in cases of skewed groups, we may
> > significantly underestimate the cost of incremental sort.  This could
> > result in choosing a more expensive subpath under the sort.  Such as
> > the example in [1], we end up with IndexScan->IncrementalSort rather
> > than SeqScan->FullSort, while the former plan is more expensive to
> > execute.  However, this point does not affect this patch, because for
> > a mergejoin, we only consider outerrel's cheapest-total-cost when we
> > assume that an explicit sort atop is needed, i.e., we do not have a
> > chance to select which subpath to use in this case.
> >
> > [1] https://postgr.es/m/CAMbWs49+CXsrgbq0LD1at-5jR=AHHN0YtDy9YvgXAsMfndZe-w@mail.gmail.com
> >
>
> You don't need any skew. Consider this perfectly uniform dataset:
>
>  Sort  (cost=127757.34..130257.34 rows=1000000 width=8)
>        (actual time=186.288..275.813 rows=1000000 loops=1)
>    Sort Key: a, b
>    Sort Method: external merge  Disk: 17704kB
>    ->  Seq Scan on t  (cost=0.00..14425.00 rows=1000000 width=8)
>                 (actual time=0.005..35.989 rows=1000000 loops=1)
>  Planning Time: 0.075 ms
>  Execution Time: 301.143 ms
>
> set enable_incremental_sort = on;

>  Incremental Sort  (cost=1.03..68908.13 rows=1000000 width=8)
>             (actual time=0.102..497.143 rows=1000000 loops=1)
>    Sort Key: a, b
>    Presorted Key: a
>    Full-sort Groups: 27039  Sort Method: quicksort
>      Average Memory: 25kB  Peak Memory: 25kB
>    ->  Index Scan using t_a_idx on t
>                    (cost=0.42..37244.25 rows=1000000 width=8)
>             (actual time=0.050..376.403 rows=1000000 loops=1)
>  Planning Time: 0.057 ms
>  Execution Time: 519.301 ms

Ok, let's first look at the total Seq Scan cost of the first EXPLAIN.
14425.00 units and 35.989 milliseconds to execute. That's about 400.81
units per millisecond. The Index Scan is only being charged 98.94
units per millisecond of execution.  If the Index Scan was costed the
same per unit as the Seq Scan, the total Index Scan cost would be
150868 which is already more than the Seq Scan plan without even
adding the Incremental Sort costs on. To me, that seems to be an
inaccuracy either with the Seq Scan costings coming out too expensive
or Index Scan coming out too cheap.

If you think that the Incremental Sort plan shouldn't be chosen
because the Index Scan costing came out too cheap (or the Seq Scan
costing was too expensive) then I disagree. Applying some penalty to
one node type because some other node type isn't costed accurately is
just not a practice we should be doing. Instead, we should be trying
our best to cost each node type as accurately as possible.  If you
think there's some inaccuracy with Incremental Sort, then let's look
into that. If you want to re-add the penalty because Index Scan
costing isn't good, let's see if we can fix Index Scan costings.

David

Re: Why don't we consider explicit Incremental Sort?

From

Richard Guo

Date:

24 September 2024, 05:21:28

On Mon, Sep 23, 2024 at 10:01 PM Tomas Vondra <tomas@vondra.me> wrote:
> You don't need any skew. Consider this perfectly uniform dataset:
>
> create table t (a int, b int);
> insert into t select 100000 * random(), 100 * random()
>   from generate_series(1,1000000) s(i);
> create index on t (a);
> vacuum analyze;
> checkpoint;
>
> explain analyze select * from t order by a, b;

I think if we want to compare the performance of incremental sort vs.
full sort on this dataset, we need to ensure that other variables are
kept constant, ie we need to ensure that both methods use the same
subpath, whether it's Index Scan or Seq Scan.

This is especially true in the scenario addressed by this patch, as
for a mergejoin, we only consider outerrel's cheapest_total_path when
we assume that an explicit sort atop is needed.  That is to say, the
subpath has already been chosen when it comes to determine whether to
use incremental sort or full sort.

According to this theory, here is what I got on this same dataset:

-- incremental sort
explain analyze select * from t order by a, b;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Incremental Sort  (cost=1.02..68838.65 rows=1000000 width=8) (actual
time=1.292..1564.265 rows=1000000 loops=1)
   Sort Key: a, b
   Presorted Key: a
   Full-sort Groups: 27022  Sort Method: quicksort  Average Memory:
25kB  Peak Memory: 25kB
   ->  Index Scan using t_a_idx on t  (cost=0.42..37244.20
rows=1000000 width=8) (actual time=0.408..1018.785 rows=1000000
loops=1)
 Planning Time: 0.998 ms
 Execution Time: 1602.861 ms
(7 rows)


-- full sort
set enable_incremental_sort to off;
set enable_seqscan to off;
explain analyze select * from t order by a, b;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=150576.54..153076.54 rows=1000000 width=8) (actual
time=1760.090..1927.598 rows=1000000 loops=1)
   Sort Key: a, b
   Sort Method: external merge  Disk: 17704kB
   ->  Index Scan using t_a_idx on t  (cost=0.42..37244.20
rows=1000000 width=8) (actual time=0.531..1010.931 rows=1000000
loops=1)
 Planning Time: 0.980 ms
 Execution Time: 1970.287 ms
(6 rows)

So incremental sort outperforms full sort on this dataset.

Thanks
Richard

Re: Why don't we consider explicit Incremental Sort?

From

Tomas Vondra

Date:

24 September 2024, 12:52:11


On 9/24/24 00:21, David Rowley wrote:
> On Tue, 24 Sept 2024 at 02:01, Tomas Vondra <tomas@vondra.me> wrote:
>>
>> On 9/23/24 05:03, Richard Guo wrote:
>>> On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>>> Just looking at the commit message:
>>>>
>>>>> The rationale is based on the assumption that incremental sort is
>>>>> always faster than full sort when there are presorted keys, a premise
>>>>> that has been applied in various parts of the code.  This assumption
>>>>> does not always hold, particularly in cases with a large skew in the
>>>>> number of rows within the presorted groups.
>>>>
>>>> My understanding is that the worst case for incremental sort is the
>>>> same as sort, i.e. only 1 presorted group, which is the same effort to
>>>> sort. Is there something I'm missing?
>>>
>>> I was thinking that when there’s a large skew in the number of rows
>>> per pre-sorted group, incremental sort might underperform full sort.
>>> As mentioned in the comments for cost_incremental_sort, it has to
>>> detect the sort groups, plus it needs to perform tuplesort_reset after
>>> each group.  However, I doubt these factors would have a substantial
>>> impact on the performance of incremental sort.  So maybe in the worst
>>> skewed groups case, incremental sort may still perform similarly to
>>> full sort.
>>>
>>> Another less obvious reason is that in cases of skewed groups, we may
>>> significantly underestimate the cost of incremental sort.  This could
>>> result in choosing a more expensive subpath under the sort.  Such as
>>> the example in [1], we end up with IndexScan->IncrementalSort rather
>>> than SeqScan->FullSort, while the former plan is more expensive to
>>> execute.  However, this point does not affect this patch, because for
>>> a mergejoin, we only consider outerrel's cheapest-total-cost when we
>>> assume that an explicit sort atop is needed, i.e., we do not have a
>>> chance to select which subpath to use in this case.
>>>
>>> [1] https://postgr.es/m/CAMbWs49+CXsrgbq0LD1at-5jR=AHHN0YtDy9YvgXAsMfndZe-w@mail.gmail.com
>>>
>>
>> You don't need any skew. Consider this perfectly uniform dataset:
>>
>>  Sort  (cost=127757.34..130257.34 rows=1000000 width=8)
>>        (actual time=186.288..275.813 rows=1000000 loops=1)
>>    Sort Key: a, b
>>    Sort Method: external merge  Disk: 17704kB
>>    ->  Seq Scan on t  (cost=0.00..14425.00 rows=1000000 width=8)
>>                 (actual time=0.005..35.989 rows=1000000 loops=1)
>>  Planning Time: 0.075 ms
>>  Execution Time: 301.143 ms
>>
>> set enable_incremental_sort = on;
> 
>>  Incremental Sort  (cost=1.03..68908.13 rows=1000000 width=8)
>>             (actual time=0.102..497.143 rows=1000000 loops=1)
>>    Sort Key: a, b
>>    Presorted Key: a
>>    Full-sort Groups: 27039  Sort Method: quicksort
>>      Average Memory: 25kB  Peak Memory: 25kB
>>    ->  Index Scan using t_a_idx on t
>>                    (cost=0.42..37244.25 rows=1000000 width=8)
>>             (actual time=0.050..376.403 rows=1000000 loops=1)
>>  Planning Time: 0.057 ms
>>  Execution Time: 519.301 ms
> 
> Ok, let's first look at the total Seq Scan cost of the first EXPLAIN.
> 14425.00 units and 35.989 milliseconds to execute. That's about 400.81
> units per millisecond. The Index Scan is only being charged 98.94
> units per millisecond of execution.  If the Index Scan was costed the
> same per unit as the Seq Scan, the total Index Scan cost would be
> 150868 which is already more than the Seq Scan plan without even
> adding the Incremental Sort costs on. To me, that seems to be an
> inaccuracy either with the Seq Scan costings coming out too expensive
> or Index Scan coming out too cheap.
> 
> If you think that the Incremental Sort plan shouldn't be chosen
> because the Index Scan costing came out too cheap (or the Seq Scan
> costing was too expensive) then I disagree. Applying some penalty to
> one node type because some other node type isn't costed accurately is
> just not a practice we should be doing. Instead, we should be trying
> our best to cost each node type as accurately as possible.  If you
> think there's some inaccuracy with Incremental Sort, then let's look
> into that. If you want to re-add the penalty because Index Scan
> costing isn't good, let's see if we can fix Index Scan costings.
> 

You're right, this wasn't a good example. I tried to come up with
something quickly, and I didn't realize the extra time comes from the
other node in the plan, not the sort :-(

I vaguely remember there were a couple reports about slow queries
involving incremental sort, but I didn't find any that would show
incremental sort itself being slower. So maybe I'm wrong ...

IIRC the concerns were more about planning - what happens when the
multi-column ndistinct estimates (which are quite shaky) are wrong, etc.
Or how it interacts with LIMIT with hidden correlations, etc.

Sure, those are not problems of incremental sort, it just makes it
easier to hit some of these issues. Of course, that doesn't mean the
1.5x penalty was a particularly good solution.


regards

-- 
Tomas Vondra

Re: Why don't we consider explicit Incremental Sort?

From

Richard Guo

Date:

10 October 2024, 05:18:52

On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
> Just looking at the commit message:
>
> > The rationale is based on the assumption that incremental sort is
> > always faster than full sort when there are presorted keys, a premise
> > that has been applied in various parts of the code.  This assumption
> > does not always hold, particularly in cases with a large skew in the
> > number of rows within the presorted groups.
>
> My understanding is that the worst case for incremental sort is the
> same as sort, i.e. only 1 presorted group, which is the same effort to
> sort. Is there something I'm missing?

I've pushed this patch after tweaking this part in the commit message.
Thank you both for your reviews.

Thanks
Richard

Re: Why don't we consider explicit Incremental Sort?

From

Andrei Lepikhov

Date:

29 October 2024, 07:43:26

On 10/10/24 09:18, Richard Guo wrote:
> On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
> I've pushed this patch after tweaking this part in the commit message.
> Thank you both for your reviews.
My apologies for the late review, but IMO there are some minor weaknesses.
Working on the improvement of the cost_sort model [1], adding into the 
model the  number of columns to be sorted and the number of comparisons 
to be made  (through the stadistinct of the first column), I found out 
that one test doesn't like IncrementalSort in aggregates.sql:

'Utilise the ordering of merge join to avoid a Sort operation'

After discovering a little bit, I realised that although the idea that 
the IncrementalSort is better than the plain Sort is generally correct, 
the assumption that it is better all the time seems wrong.
For example, IncrementalSort, following an index, can choose a sort 
order with a low level of distinct values of the first columns, causing 
more comparisons - remember, we still sort tuples inside a group; or 
using an index on multiple columns having a need in just the first 
column and additional Sort on many others from the heap, etc.

Of course, I provide highly skewed cases and Sort + SeqScan basically 
will resolve the issue. But I think you may discover both possible sort 
ways; don't heuristically give a chance only to IncrementalSort. At 
least, IndexScan can beat SeqScan because of the index clause selectivity.

[1] 
https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com

-- 
regards, Andrei Lepikhov