Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Why don't we consider explicit Incremental Sort?
Date
Msg-id f8f86cb5-cb60-4722-b3c7-99c7da62ace2@vondra.me
Whole thread Raw
In response to Re: Why don't we consider explicit Incremental Sort?  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Why don't we consider explicit Incremental Sort?
List pgsql-hackers
On 9/13/24 13:18, Richard Guo wrote:
> On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas@vondra.me> wrote:
>> I have not thought about this particular case too much, but how likely
>> is it that mergejoin will win for this plan in practice? If we consider
>> a query that only needs a fraction of rows (say, thanks to LIMIT),
>> aren't we likely to pick a nested loop (which can do incremental sort
>> for the outer plan)? For joins that need to run to completion it might
>> be a win, but there's also the higher risk of a poor costing.
> 
> I think one situation where mergejoin tends to outperform hashjoin and
> nestloop is when ORDER BY clauses are present.  For example, for the
> query below, mergejoin runs much faster than hashjoin and nestloop,
> and mergejoin with incremental sort is even faster than mergejoin with
> full sort.
> 

Sure, an incremental sort can improve things if things go well, no doubt
about that. But how significant can the improvement be, especially if we
need to sort everything? In your example it's ~15%, which is nice, and
maybe the benefits can be much larger if the incremental sort can do
everything in memory, without flushing to disk.

But what about the risks? What will happen if the groups are not this
uniformly and independently sized, and stuff like that? Maybe it'll be
costed well enough, I haven't tried.

I don't recall the exact reasoning for why we didn't add incremental
sort in various places, I'd have to dig into the old threads, or
something. But I believe thinking about these risks was part of it -
trying to handle places where the potential benefits are much larger
than the risks.

As I wrote earlier, it's not my intent to discourage you from working on
this. Please do, I'm sure it can be improved.


regards

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: Srirama Kucherlapati
Date:
Subject: RE: AIX support
Next
From: vignesh C
Date:
Subject: Re: Pgoutput not capturing the generated columns