Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Why don't we consider explicit Incremental Sort?
Date
Msg-id CAMbWs48xVJp62Yp82T-R8_SChc2-WcU1FE0J6B3j2yHNOFVViA@mail.gmail.com
Whole thread Raw
In response to Re: Why don't we consider explicit Incremental Sort?  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Why don't we consider explicit Incremental Sort?
List pgsql-hackers
On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
> Just looking at the commit message:
>
> > The rationale is based on the assumption that incremental sort is
> > always faster than full sort when there are presorted keys, a premise
> > that has been applied in various parts of the code.  This assumption
> > does not always hold, particularly in cases with a large skew in the
> > number of rows within the presorted groups.
>
> My understanding is that the worst case for incremental sort is the
> same as sort, i.e. only 1 presorted group, which is the same effort to
> sort. Is there something I'm missing?

I was thinking that when there’s a large skew in the number of rows
per pre-sorted group, incremental sort might underperform full sort.
As mentioned in the comments for cost_incremental_sort, it has to
detect the sort groups, plus it needs to perform tuplesort_reset after
each group.  However, I doubt these factors would have a substantial
impact on the performance of incremental sort.  So maybe in the worst
skewed groups case, incremental sort may still perform similarly to
full sort.

Another less obvious reason is that in cases of skewed groups, we may
significantly underestimate the cost of incremental sort.  This could
result in choosing a more expensive subpath under the sort.  Such as
the example in [1], we end up with IndexScan->IncrementalSort rather
than SeqScan->FullSort, while the former plan is more expensive to
execute.  However, this point does not affect this patch, because for
a mergejoin, we only consider outerrel's cheapest-total-cost when we
assume that an explicit sort atop is needed, i.e., we do not have a
chance to select which subpath to use in this case.

[1] https://postgr.es/m/CAMbWs49+CXsrgbq0LD1at-5jR=AHHN0YtDy9YvgXAsMfndZe-w@mail.gmail.com

Thanks
Richard



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Pgoutput not capturing the generated columns
Next
From: jian he
Date:
Subject: Re: ANALYZE ONLY