Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200331224537.gpmgeivxw6gh35hb@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
On Tue, Mar 31, 2020 at 06:00:00PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 5:53 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote:
>> >On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra
>> ><tomas.vondra@2ndquadrant.com> wrote:
>> >>
>> >> The main thing I've been working on today is benchmarking how this
>> >> affects planning. And I'm seeing a regression that worries me a bit,
>> >> unfortunately.
>> >>
>> >> The test I'm doing is pretty simple - build a small table with a bunch
>> >> of columns:
>> >>
>> >>    create table t (a int, b int, c int, d int, e int, f int, g int);
>> >>
>> >>    insert into t select 100*random(), 100*random(), 100*random(),
>> >>      100*random(), 100*random(), 100*random(), 100*random()
>> >>    from generate_series(1,100000) s(i);
>> >>
>> >> and then a number of indexes on subsets of up to 3 columns, as generated
>> >> using the attached build-indexes.py script. And then run a bunch of
>> >> explains (so no actual execution) sorting the data by at least 4 columns
>> >> (to trigger incremental sort paths), measuring timing of the script.
>> >>
>> >> I did a bunch of runs on current master and v46 with incremental sort
>> >> disabled and enabled, and the results look like this:
>> >>
>> >>      master       off        on
>> >>      --------------------------
>> >>      34.609    37.463    37.729
>> >>
>> >> which means about 8-9% regression with incremental sort. Of course, this
>> >> is only for planning time, for execution the impact is going to be much
>> >> smaller. But it's still a bit annoying.
>> >>
>> >> I've suspected this might be either due to the add_partial_path changes
>> >> or the patch adding incremental sort to additional places, so I tested
>> >> those parts individually and the answer is no - add_partial_path changes
>> >> have very small impact (~1%, which might be noise). The regression comes
>> >> mostly from the 0002 part that adds incremental sort. At least in this
>> >> particular test - different tests might behave differently, of course.
>> >>
>> >> The annoying bit is that the overhead does not disappear after disabling
>> >> incremental sort. That suggests this is not merely due to considering
>> >> and creating higher number of paths, but due to something that happens
>> >> before we even look at the GUC ...
>> >>
>> >> I think I've found one such place - if you look at compare_pathkeys, it
>> >> has this check right before the forboth() loop:
>> >>
>> >>      if (keys1 == keys2)
>> >>          return PATHKEYS_EQUAL;
>> >>
>> >> But with incremental sort we don't call pathkeys_contained_in, we call
>> >> pathkeys_common_contained_in instead. And that does not call
>> >> compare_pathkeys and does not have the simple equality check. Adding
>> >> the following check seems to cut the overhead in half, which is nice:
>> >>
>> >>      if (keys1 == keys2)
>> >>      {
>> >>          *n_common = list_length(keys1);
>> >>         return true;
>> >>      }
>> >>
>> >> Not sure where the rest of the regression comes from yet.
>> >
>> >I noticed in the other function we also optimize by checking if either
>> >keys list is NIL, so I tried adding that, and it might have made a
>> >minor difference, but it's hard to tell as it was under 1%.
>> >
>>
>> Which other function? I don't see this optimization in compare_pathkeys,
>> that only checks key1/key2 after the forboth loop, but that's something
>> different.
>
>pathkeys_useful_for_ordering checks both inputs.
>
>> I may be wrong, but my guess would be that this won't save much, because
>> the loop should terminate immediately. The whole point is not to loop
>> over possibly many pathkeys when we know it's exactly the same pathkeys
>> list (same pointer). When one of the lists is NIL the loop should
>> terminate right away ...
>>
>> >I also ran perf with a slightly modified version of your test that
>> >uses psql, and after the above changes was seeing something like a
>> >3.5% delta between master and this patch series. Nothing obvious in
>> >the perf report though.
>> >
>>
>> Yeah, I don't think there's anything obviously more expensive.
>>
>> >This test is intended to be somewhat worst case, no? At what point do
>> >we consider the trade-off worth it (given that it's not plausible to
>> >have zero impact)?
>> >
>>
>> Yes, more or less. It was definitely designed to do that - it merely
>> plans the query (no execution), with many applicable indexes etc. It's
>> definitely true that once the exection starts to take more time, the
>> overhead will get negligible. Same for reducing number of indexes.
>>
>> And of course, for queries that can benefit from incremental sort, the
>> speedup is likely way larger than this.
>>
>> In general, I think it'd be naive that we can make planner smarter with
>> no extra overhead spent on planning, and we can never accept patches
>> adding even tiny overhead. With that approach we'd probably end up with
>> a trivial planner that generates just a single query plan, because
>> that's going to be the fastest planner. A realistic approach needs to
>> consider both the planning and execution phase, and benefits of this
>> patch seem to be clear - if you have queries that do benefit from it.
>>
>> Let's try to minimize the overhead a bit more, and then we'll see.
>
>Any thoughts you have already on an approach for this?
>

That very much depends on what may be causing the problem. I have two
hypotheses, at the moment.

Based on the profiles I've seen so far, there does not seem to be any
function that suddenly got slower. That probably implies we're simply
generating more paths than before, which means more allocation, mode
add_path calls etc. It's not clear to me why would this happen even with
enable_incrementalsort=off, though.

Or maybe some of the structs got larger and need more cachelines? That
would affect performance even with the GUC set to off. But the perf stat
data also don't show anything particularly revealing. I'm using this:

   perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores,\
                dTLB-loads,dTLB-load-misses,dTLB-prefetch-misses,\
           LLC-loads,LLC-load-misses,LLC-stores,LLC-prefetches,\
           cycles,instructions,cache-references,cache-misses,'
           bus-cycles,raw_syscalls:sys_enter -p $PID

An example output (for master and patched branch) is attached, but I
don't see anything obviously worse (there is some variance, of course).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Andres Freund
Date:
Subject: Re: backup manifests