Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20200331235619.ezleuyqjjjy67lh7@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: >> >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> >> In general, I think it'd be naive that we can make planner smarter with >> >> no extra overhead spent on planning, and we can never accept patches >> >> adding even tiny overhead. With that approach we'd probably end up with >> >> a trivial planner that generates just a single query plan, because >> >> that's going to be the fastest planner. A realistic approach needs to >> >> consider both the planning and execution phase, and benefits of this >> >> patch seem to be clear - if you have queries that do benefit from it. >> > >> >I think that's kind of attacking a straw man, though. The thing that >> >people push back on, or should push back on IMO, is when a proposed >> >patch adds significant slowdown to queries that it has no or very little >> >hope of improving. The trick is to do expensive stuff only when >> >there's a good chance of getting a better plan out of it. >> > >> >> Yeah, I agree with that. I think the main issue is that we don't really >> know what the "expensive stuff" is in this case, so it's not really >> clear how to be smarter :-( > >To add to this: I agree that ideally you'd check cheaply to know >you're in a situation that might help, and then do more work. But here >the question is always going to be simply "would we benefit from an >ordering, and, if so, do we have it already partially sorted". It's >hard to imagine that reducing much conceptually, so we're left with >optimizations of that check. > I think it depends on what exactly is the expensive part. For example if it's the construction of IncrementalSort paths, then maybe we could try do a quick/check check if the path can even be useful by estimating the cost and only then building the path. That's what we do for join algorithms, for example - we first compute initial_cost_nestloop and only when that seems cheap enough we do the more expensive stuff. But I'm not sure the path construction is the expensive part, as it should be disabled by enable_incrementalsort=off. But the regression does not seem to disappear, at least not entirely. >> One possibility is that it's just one of those regressions due to change >> in binary layout, but I'm not sure know how to verify that. > >If we are testing with a case that can't actually add more paths (due >to it checking the guc before building them), doesn't that effectively >leave one of these two options: > >1. Binary layout/cache/other untraceable change, or >2. Changes due to refactored function calls. > Hmm, but in case of (1) the overhead should be there even with tests that don't really have any additional paths to consider, right? I've tried with such test (single table with no indexes) and I don't quite see any regression (maybe ~1%). (2) might have impact, but I don't see any immediate supects. Did you have some functions in mind? BTW I see the patch adds pathkeys_common but it's not called from anywhere. It's probably leftover from an earlier patch version. >There's not anything obvious in point (2) that would be a big cost, >but there are definitely changes there. I was surprised that just >eliminating the loop through the pathkeys on the query and the index >was enough to save us ~4%. > >Tomas: Earlier you'd wondered about if we should try to shortcut the >changes in costing...I was skeptical of that originally, but maybe >it's worth looking into? I'm going to try backing that out and see >what the numbers look like. > I've described the idea about something like initial_cost_nestloop and so on. But I'm a bit skeptical about it, considering that the GUC only has limited effect. A somewhat note is that the number of indexes has pretty significant impact on planning time, even on master. This is timing of the same explain script (similar to the one shown before) with different number of indexes on master: 0 indexes 7 indexes 49 indexes ------------------------------------------- 10.85 12.56 27.83 The way I look at incremental sort is that it allows using indexes for queries that couldn't use it before, possibly requiring a separate index. So incremental sort might easily reduce the number of indexes needed, compensating for the overhead we're discussing here. Of course, that may or may not be true. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: