Re: Why don't we consider explicit Incremental Sort? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Why don't we consider explicit Incremental Sort? |
Date | |
Msg-id | 43ee4d6a-3531-4bdd-bd41-bad7b2c67281@vondra.me Whole thread Raw |
In response to | Why don't we consider explicit Incremental Sort? (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Why don't we consider explicit Incremental Sort?
|
List | pgsql-hackers |
On 9/13/24 06:04, Richard Guo wrote: > On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <tomas@vondra.me> wrote: >> I think we intentionally added incremental sort ... incrementally ;-) > > Haha, right. > >> I think one challenge with this case is that create_mergejoin_plan >> creates these Sort plans explicitly - it's not "pathified" so it doesn't >> go through the usual cost comparison etc. And it certainly is not the >> case that incremental sort would win always, so we'd need to replicate >> the cost comparison logic too. >> >> I have not thought about this particular case too much, but how likely >> is it that mergejoin will win for this plan in practice? If we consider >> a query that only needs a fraction of rows (say, thanks to LIMIT), >> aren't we likely to pick a nested loop (which can do incremental sort >> for the outer plan)? For joins that need to run to completion it might >> be a win, but there's also the higher risk of a poor costing. > > Yeah, currently mergejoin path always assumes that full sort would be > used on top of the outer path and inner path if they are not already > ordered appropriately. So initial_cost_mergejoin directly charges the > cost of full sort into outer/inner path's cost, without going through > the usual cost comparison with incremental sort. > > It seems to me that some parts of our code assume that, for a given > input path that is partially ordered, an incremental sort is always > preferable to a full sort (see commit 4a29eabd1). Am I getting this > correctly? I don't think we're making such assumption. I don't know which of the many places modified by 4a29eabd1 you have in mind, but IIRC we always consider both full and incremental sort. > If this is the case, then I think using the following > outer path for the merge join: > > -> Incremental Sort > Sort Key: t1.a, t1.b > Presorted Key: t1.a > -> Index Scan using t1_a_idx on t1 > > ... would be an immediate improvement over the current path, which is: > > -> Sort > Sort Key: t1.a, t1.b > -> Index Scan using t1_a_idx on t1 > > > The shaky cost estimates for incremental sort that you mentioned are > indeed a concern. Fortunately we have the enable_incremental_sort GUC > already. As in may other parts of the code (such as in the function > add_paths_to_grouping_rel), we can always disable incremental sort to > fall back to a full sort if needed. > I think our goal should be to not rely on the enable_incremental_sort GUC as a defense very much. It's a very blunt instrument, that often forces users to pick whether they prefer to improve one query while harming some other queries. I personally consider these enable_ GUC more a development tool than something suitable for users. >> I'm not saying it's not worth exploring, I'm just trying recall reasons >> why it might not work. Also I don't think it's fundamentally impossible >> to do mark/restore for incremental sort - I just haven't tried doing it >> because it wasn't necessary. In the worst case we could simply add a >> Materialize node on top, no? > > Yeah, perhaps we could support mark/restore for incremental sort > someday. This would allow us to apply incremental sort to the inner > path of a merge join too. But if we apply a Material node on top of > the inner due to the lack of mark/restore support for incremental > sort, then we will need to compare two paths: > > partially sorted path -> incremental sort -> material > > VS. > > partially sorted path -> full sort > > I think it's hard to tell which is cheaper without a cost comparison, > which we do not have for now. > How is this different from the incremental sort costing we already have? > > To help with the discussion, I drafted a WIP patch as attached, which > chooses an incremental sort over a full sort on the given outer path > of a mergejoin whenever possible. There is one ensuing plan diff in > the regression tests (not included in the patch). It seems that some > query in the tests begins to use incremental sort for mergejoin. > I'm not against the patch in principle, but I haven't thought about the costing and risk very much. If I have time I'll try to do some more experiments to see how it behaves, but no promises. regards -- Tomas Vondra
pgsql-hackers by date: