Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200331235619.ezleuyqjjjy67lh7@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>> >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> >> In general, I think it'd be naive that we can make planner smarter with
>> >> no extra overhead spent on planning, and we can never accept patches
>> >> adding even tiny overhead. With that approach we'd probably end up with
>> >> a trivial planner that generates just a single query plan, because
>> >> that's going to be the fastest planner. A realistic approach needs to
>> >> consider both the planning and execution phase, and benefits of this
>> >> patch seem to be clear - if you have queries that do benefit from it.
>> >
>> >I think that's kind of attacking a straw man, though.  The thing that
>> >people push back on, or should push back on IMO, is when a proposed
>> >patch adds significant slowdown to queries that it has no or very little
>> >hope of improving.  The trick is to do expensive stuff only when
>> >there's a good chance of getting a better plan out of it.
>> >
>>
>> Yeah, I agree with that. I think the main issue is that we don't really
>> know what the "expensive stuff" is in this case, so it's not really
>> clear how to be smarter :-(
>
>To add to this: I agree that ideally you'd check cheaply to know
>you're in a situation that might help, and then do more work. But here
>the question is always going to be simply "would we benefit from an
>ordering, and, if so, do we have it already partially sorted". It's
>hard to imagine that reducing much conceptually, so we're left with
>optimizations of that check.
>

I think it depends on what exactly is the expensive part. For example if
it's the construction of IncrementalSort paths, then maybe we could try
do a quick/check check if the path can even be useful by estimating the
cost and only then building the path.

That's what we do for join algorithms, for example - we first compute
initial_cost_nestloop and only when that seems cheap enough we do the
more expensive stuff.

But I'm not sure the path construction is the expensive part, as it
should be disabled by enable_incrementalsort=off. But the regression
does not seem to disappear, at least not entirely.


>> One possibility is that it's just one of those regressions due to change
>> in binary layout, but I'm not sure know how to verify that.
>
>If we are testing with a case that can't actually add more paths (due
>to it checking the guc before building them), doesn't that effectively
>leave one of these two options:
>
>1. Binary layout/cache/other untraceable change, or
>2. Changes due to refactored function calls.
>

Hmm, but in case of (1) the overhead should be there even with tests
that don't really have any additional paths to consider, right? I've
tried with such test (single table with no indexes) and I don't quite
see any regression (maybe ~1%).

(2) might have impact, but I don't see any immediate supects. Did you
have some functions in mind?

BTW I see the patch adds pathkeys_common but it's not called from
anywhere. It's probably leftover from an earlier patch version.

>There's not anything obvious in point (2) that would be a big cost,
>but there are definitely changes there. I was surprised that just
>eliminating the loop through the pathkeys on the query and the index
>was enough to save us ~4%.
>
>Tomas: Earlier you'd wondered about if we should try to shortcut the
>changes in costing...I was skeptical of that originally, but maybe
>it's worth looking into? I'm going to try backing that out and see
>what the numbers look like.
>

I've described the idea about something like initial_cost_nestloop and
so on. But I'm a bit skeptical about it, considering that the GUC only
has limited effect.


A somewhat note is that the number of indexes has pretty significant
impact on planning time, even on master. This is timing of the same
explain script (similar to the one shown before) with different number
of indexes on master:

     0 indexes        7 indexes       49 indexes
     -------------------------------------------
         10.85            12.56            27.83

The way I look at incremental sort is that it allows using indexes for
queries that couldn't use it before, possibly requiring a separate
index. So incremental sort might easily reduce the number of indexes
needed, compensating for the overhead we're discussing here. Of course,
that may or may not be true.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: proposal \gcsv
Next
From: James Coleman
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)