Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Why don't we consider explicit Incremental Sort? |
Date | |
Msg-id | 40d81bb0-a7c2-4ab8-a109-52a4c9df4839@vondra.me Whole thread Raw |
In response to | Why don't we consider explicit Incremental Sort? (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Why don't we consider explicit Incremental Sort?
|
List | pgsql-hackers |
Hi, On 9/9/24 11:39, Richard Guo wrote: > While looking at label_sort_with_costsize() due to another issue, I > noticed that we build explicit Sort nodes but not explicit Incremental > Sort nodes. I wonder why this is the case. It seems to me that > Incremental Sorts are preferable in some cases. For example: > I think we intentionally added incremental sort ... incrementally ;-) I don't recall the reasoning exactly, but I think we realized the incremental sort costing can be a bit shaky (and AFAIK we saw a couple cases of that reported). So we added it to places where the reasoning was it wouldn't be a problem and the incremental sort would be a clear win, e.g. thanks to the "cheap startup" thing. > create table t (a int, b int); > create index on t (a); > > set enable_seqscan to off; > > explain (costs off) > select * from t t1 join t t2 on t1.a = t2.a and t1.b = t2.b; > QUERY PLAN > ------------------------------------------------- > Merge Join > Merge Cond: ((t1.a = t2.a) AND (t1.b = t2.b)) > -> Sort > Sort Key: t1.a, t1.b > -> Index Scan using t_a_idx on t t1 > -> Sort > Sort Key: t2.a, t2.b > -> Index Scan using t_a_idx on t t2 > (8 rows) > > For the outer path of a mergejoin, I think we can leverage Incremental > Sort to save effort. For the inner path, though, we cannot do this > because Incremental Sort does not support mark/restore. > > It could be argued that what if a mergejoin with an Incremental Sort > as the outer path is selected as the inner path of another mergejoin? > Well, I don't think this is a problem, because mergejoin itself does > not support mark/restore either, and we will add a Material node on > top of it anyway in this case (see final_cost_mergejoin). > > label_sort_with_costsize is also called in create_append_plan, > create_merge_append_plan and create_unique_plan. In these cases, we > may also consider using Incremental Sort. But I haven't looked into > these cases. > > Any thoughts? > I think one challenge with this case is that create_mergejoin_plan creates these Sort plans explicitly - it's not "pathified" so it doesn't go through the usual cost comparison etc. And it certainly is not the case that incremental sort would win always, so we'd need to replicate the cost comparison logic too. I have not thought about this particular case too much, but how likely is it that mergejoin will win for this plan in practice? If we consider a query that only needs a fraction of rows (say, thanks to LIMIT), aren't we likely to pick a nested loop (which can do incremental sort for the outer plan)? For joins that need to run to completion it might be a win, but there's also the higher risk of a poor costing. I'm not saying it's not worth exploring, I'm just trying recall reasons why it might not work. Also I don't think it's fundamentally impossible to do mark/restore for incremental sort - I just haven't tried doing it because it wasn't necessary. In the worst case we could simply add a Materialize node on top, no? regards -- Tomas Vondra
pgsql-hackers by date: