Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200401003809.pup7i7yr6pwramk3@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
>> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
>> ><tomas.vondra@2ndquadrant.com> wrote:
>> >>
>> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>> >> >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> >> >> In general, I think it'd be naive that we can make planner smarter with
>> >> >> no extra overhead spent on planning, and we can never accept patches
>> >> >> adding even tiny overhead. With that approach we'd probably end up with
>> >> >> a trivial planner that generates just a single query plan, because
>> >> >> that's going to be the fastest planner. A realistic approach needs to
>> >> >> consider both the planning and execution phase, and benefits of this
>> >> >> patch seem to be clear - if you have queries that do benefit from it.
>> >> >
>> >> >I think that's kind of attacking a straw man, though.  The thing that
>> >> >people push back on, or should push back on IMO, is when a proposed
>> >> >patch adds significant slowdown to queries that it has no or very little
>> >> >hope of improving.  The trick is to do expensive stuff only when
>> >> >there's a good chance of getting a better plan out of it.
>> >> >
>> >>
>> >> Yeah, I agree with that. I think the main issue is that we don't really
>> >> know what the "expensive stuff" is in this case, so it's not really
>> >> clear how to be smarter :-(
>> >
>> >To add to this: I agree that ideally you'd check cheaply to know
>> >you're in a situation that might help, and then do more work. But here
>> >the question is always going to be simply "would we benefit from an
>> >ordering, and, if so, do we have it already partially sorted". It's
>> >hard to imagine that reducing much conceptually, so we're left with
>> >optimizations of that check.
>> >
>>
>> I think it depends on what exactly is the expensive part. For example if
>> it's the construction of IncrementalSort paths, then maybe we could try
>> do a quick/check check if the path can even be useful by estimating the
>> cost and only then building the path.
>>
>> That's what we do for join algorithms, for example - we first compute
>> initial_cost_nestloop and only when that seems cheap enough we do the
>> more expensive stuff.
>>
>> But I'm not sure the path construction is the expensive part, as it
>> should be disabled by enable_incrementalsort=off. But the regression
>> does not seem to disappear, at least not entirely.
>>
>>
>> >> One possibility is that it's just one of those regressions due to change
>> >> in binary layout, but I'm not sure know how to verify that.
>> >
>> >If we are testing with a case that can't actually add more paths (due
>> >to it checking the guc before building them), doesn't that effectively
>> >leave one of these two options:
>> >
>> >1. Binary layout/cache/other untraceable change, or
>> >2. Changes due to refactored function calls.
>> >
>>
>> Hmm, but in case of (1) the overhead should be there even with tests
>> that don't really have any additional paths to consider, right? I've
>> tried with such test (single table with no indexes) and I don't quite
>> see any regression (maybe ~1%).
>
>Not necessarily, if the cost is in sort costing or useful pathkeys
>checking, right? We have run that code even without incremental sort,
>but it's changed from master.
>

Ah, I should have mentioned I've done most of the tests on just the
basic incremental sort patch (0001+0002), without the additional useful
paths. I initially tested the whole patch series, but after discovering
the regression I removed the last part (which I suspected might be the
root cause). But the regression is still there, so it's not that.

It might be in the reworked costing, yeah. But then I'd expect those
function to show in the perf profile.

>> (2) might have impact, but I don't see any immediate supects. Did you
>> have some functions in mind?
>
>I guess this is where the lines blur: I didn't see anything obvious
>either, but the changes to sort costing...should probably not have
>real impact...but...
>

:-(

>> BTW I see the patch adds pathkeys_common but it's not called from
>> anywhere. It's probably leftover from an earlier patch version.
>>
>> >There's not anything obvious in point (2) that would be a big cost,
>> >but there are definitely changes there. I was surprised that just
>> >eliminating the loop through the pathkeys on the query and the index
>> >was enough to save us ~4%.
>> >
>> >Tomas: Earlier you'd wondered about if we should try to shortcut the
>> >changes in costing...I was skeptical of that originally, but maybe
>> >it's worth looking into? I'm going to try backing that out and see
>> >what the numbers look like.
>> >
>
>BTW, I did this test, and it looks like we can get back something
>close to 1% by reverting that initial fix on partial path costing. But
>we can't get rid of it all the time, at the very least. *Maybe* we
>could condition it on incremental sort, but I'm not convinced that's
>the only place it's needed as a fix.
>

Sounds interesting. I actually tried how much the add_partial_path
change accounts for, and you're right it was quite a bit. But I forgot
about that when investigating the rest.

I wonder how large would the regression be without add_partial_path and
with the fix in pathkeys_common_contained_in.

I'm not sure how much we want to make add_partial_path() dependent on
particular GUCs, but I guess if it gets rid of the regression, allows us
to commit incremental sort and we can reasonably justify that only
incremental sort needs those paths, it might be acceptable.

>> I've described the idea about something like initial_cost_nestloop and
>> so on. But I'm a bit skeptical about it, considering that the GUC only
>> has limited effect.
>>
>>
>> A somewhat note is that the number of indexes has pretty significant
>> impact on planning time, even on master. This is timing of the same
>> explain script (similar to the one shown before) with different number
>> of indexes on master:
>>
>>      0 indexes        7 indexes       49 indexes
>>      -------------------------------------------
>>          10.85            12.56            27.83
>>
>> The way I look at incremental sort is that it allows using indexes for
>> queries that couldn't use it before, possibly requiring a separate
>> index. So incremental sort might easily reduce the number of indexes
>> needed, compensating for the overhead we're discussing here. Of course,
>> that may or may not be true.
>
>One small idea, but I'm not yet sure it helps us a whole lot: if the
>query pathkeys is only length 1, then we could skip the additional
>path creation.
>

I don't follow. Why would we create incremental sort in this case at
all? With single-element query_pathkeys the path is either unsorted or
fully sorted - there's no room for incremental sort. No?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: James Coleman
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)