Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Why don't we consider explicit Incremental Sort?
Date
Msg-id 43ee4d6a-3531-4bdd-bd41-bad7b2c67281@vondra.me
Whole thread Raw
In response to Why don't we consider explicit Incremental Sort?  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Why don't we consider explicit Incremental Sort?
List pgsql-hackers

On 9/13/24 06:04, Richard Guo wrote:
> On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas@vondra.me> wrote:
>> I think we intentionally added incremental sort ... incrementally ;-)
> 
> Haha, right.
> 
>> I think one challenge with this case is that create_mergejoin_plan
>> creates these Sort plans explicitly - it's not "pathified" so it doesn't
>> go through the usual cost comparison etc. And it certainly is not the
>> case that incremental sort would win always, so we'd need to replicate
>> the cost comparison logic too.
>>
>> I have not thought about this particular case too much, but how likely
>> is it that mergejoin will win for this plan in practice? If we consider
>> a query that only needs a fraction of rows (say, thanks to LIMIT),
>> aren't we likely to pick a nested loop (which can do incremental sort
>> for the outer plan)? For joins that need to run to completion it might
>> be a win, but there's also the higher risk of a poor costing.
> 
> Yeah, currently mergejoin path always assumes that full sort would be
> used on top of the outer path and inner path if they are not already
> ordered appropriately.  So initial_cost_mergejoin directly charges the
> cost of full sort into outer/inner path's cost, without going through
> the usual cost comparison with incremental sort.
> 
> It seems to me that some parts of our code assume that, for a given
> input path that is partially ordered, an incremental sort is always
> preferable to a full sort (see commit 4a29eabd1).  Am I getting this
> correctly?

I don't think we're making such assumption. I don't know which of the
many places modified by 4a29eabd1 you have in mind, but IIRC we always
consider both full and incremental sort.

>  If this is the case, then I think using the following
> outer path for the merge join:
> 
>    ->  Incremental Sort
>          Sort Key: t1.a, t1.b
>          Presorted Key: t1.a
>          ->  Index Scan using t1_a_idx on t1
> 
> ... would be an immediate improvement over the current path, which is:
> 
>    ->  Sort
>          Sort Key: t1.a, t1.b
>          ->  Index Scan using t1_a_idx on t1
> 
> 
> The shaky cost estimates for incremental sort that you mentioned are
> indeed a concern.  Fortunately we have the enable_incremental_sort GUC
> already.  As in may other parts of the code (such as in the function
> add_paths_to_grouping_rel), we can always disable incremental sort to
> fall back to a full sort if needed.
> 

I think our goal should be to not rely on the enable_incremental_sort
GUC as a defense very much. It's a very blunt instrument, that often
forces users to pick whether they prefer to improve one query while
harming some other queries. I personally consider these enable_ GUC more
a development tool than something suitable for users.

>> I'm not saying it's not worth exploring, I'm just trying recall reasons
>> why it might not work. Also I don't think it's fundamentally impossible
>> to do mark/restore for incremental sort - I just haven't tried doing it
>> because it wasn't necessary. In the worst case we could simply add a
>> Materialize node on top, no?
> 
> Yeah, perhaps we could support mark/restore for incremental sort
> someday.  This would allow us to apply incremental sort to the inner
> path of a merge join too.  But if we apply a Material node on top of
> the inner due to the lack of mark/restore support for incremental
> sort, then we will need to compare two paths:
> 
> partially sorted path -> incremental sort -> material
> 
> VS.
> 
> partially sorted path -> full sort
> 
> I think it's hard to tell which is cheaper without a cost comparison,
> which we do not have for now.
> 

How is this different from the incremental sort costing we already have?

> 
> To help with the discussion, I drafted a WIP patch as attached, which
> chooses an incremental sort over a full sort on the given outer path
> of a mergejoin whenever possible.  There is one ensuing plan diff in
> the regression tests (not included in the patch).  It seems that some
> query in the tests begins to use incremental sort for mergejoin.
> 

I'm not against the patch in principle, but I haven't thought about the
costing and risk very much. If I have time I'll try to do some more
experiments to see how it behaves, but no promises.


regards

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: Jim Jones
Date:
Subject: Re: Psql meta-command conninfo+
Next
From: Srirama Kucherlapati
Date:
Subject: RE: AIX support