On 10/10/24 09:18, Richard Guo wrote:
> On Sun, Sep 22, 2024 at 1:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
> I've pushed this patch after tweaking this part in the commit message.
> Thank you both for your reviews.
My apologies for the late review, but IMO there are some minor weaknesses.
Working on the improvement of the cost_sort model [1], adding into the
model the number of columns to be sorted and the number of comparisons
to be made (through the stadistinct of the first column), I found out
that one test doesn't like IncrementalSort in aggregates.sql:
'Utilise the ordering of merge join to avoid a Sort operation'
After discovering a little bit, I realised that although the idea that
the IncrementalSort is better than the plain Sort is generally correct,
the assumption that it is better all the time seems wrong.
For example, IncrementalSort, following an index, can choose a sort
order with a low level of distinct values of the first columns, causing
more comparisons - remember, we still sort tuples inside a group; or
using an index on multiple columns having a need in just the first
column and additional Sort on many others from the heap, etc.
Of course, I provide highly skewed cases and Sort + SeqScan basically
will resolve the issue. But I think you may discover both possible sort
ways; don't heuristically give a chance only to IncrementalSort. At
least, IndexScan can beat SeqScan because of the index clause selectivity.
[1]
https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com
--
regards, Andrei Lepikhov