Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From James Coleman
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id CAAaqYe-GGB1=xSTBgk-OniOW05+SX4w_KYn+gj8+pxXQJCG_PA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Mon, Jul 8, 2019 at 9:37 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Mon, Jul 08, 2019 at 12:07:06PM -0400, James Coleman wrote:
> >On Mon, Jul 8, 2019 at 10:58 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Mon, Jul 08, 2019 at 10:32:18AM -0400, James Coleman wrote:
> >> >On Mon, Jul 8, 2019 at 9:59 AM Tomas Vondra
> >> ><tomas.vondra@2ndquadrant.com> wrote:
> >> >>
> >> >> On Mon, Jul 08, 2019 at 09:22:39AM -0400, James Coleman wrote:
> >> >> >On Sun, Jul 7, 2019 at 5:02 PM Tomas Vondra
> >> >> ><tomas.vondra@2ndquadrant.com> wrote:
> >> >> >> We're running query like this:
> >> >> >>
> >> >> >>   SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3
> >> >> >>
> >> >> >> but we're trying to add the incremental sort *before* the aggregation,
> >> >> >> because the optimizer also considers group aggregate with a sorted
> >> >> >> input. And (a) is a prefix of (a,sum(b),count(b)) so we think we
> >> >> >> actually can do this, but clearly that's nonsense, because we can't
> >> >> >> possibly know the aggregates yet. Hence the error.
> >> >> >>
> >> >> >> If this is the actual issue, we need to ensure we actually can evaluate
> >> >> >> all the pathkeys. I don't know how to do that yet. I thought that maybe
> >> >> >> we should modify pathkeys_common_contained_in() to set presorted_keys to
> >> >> >> 0 in this case.
> >> >> >>
> >> >> >> But then I started wondering why we don't see this issue even for
> >> >> >> regular (non-incremental-sort) paths built in create_ordered_paths().
> >> >> >> How come we don't see these failures there? I've modified costing to
> >> >> >> make all incremental sort paths very cheap, and still nothing.
> >> >> >
> >> >> >I assume you mean you modified costing to make regular sort paths very cheap?
> >> >> >
> >> >>
> >> >> No, I mean costing of incremental sort paths, so that they end up being
> >> >> the cheapest ones. If some other path is cheaper, we won't see the error
> >> >> because it only happens when building plan from the cheapest path.
> >> >
> >> >Ah, I misunderstood as you trying to figure out a way to try to cause
> >> >the same problem with a regular sort.
> >> >
> >> >> >> So presumably there's a check elsewhere (either implicit or explicit),
> >> >> >> because create_ordered_paths() uses pathkeys_common_contained_in() and
> >> >> >> does not have the same issue.
> >> >> >
> >> >> >Given this comment in create_ordered_paths():
> >> >> >
> >> >> >  generate_gather_paths() will have already generated a simple Gather
> >> >> >  path for the best parallel path, if any, and the loop above will have
> >> >> >  considered sorting it.  Similarly, generate_gather_paths() will also
> >> >> >  have generated order-preserving Gather Merge plans which can be used
> >> >> >  without sorting if they happen to match the sort_pathkeys, and the loop
> >> >> >  above will have handled those as well.  However, there's one more
> >> >> >  possibility: it may make sense to sort the cheapest partial path
> >> >> >  according to the required output order and then use Gather Merge.
> >> >> >
> >> >> >my understanding is that generate_gather_paths() only considers paths
> >> >> >that already happen to be sorted (not explicit sorts), so I'm
> >> >> >wondering if it would make more sense for the incremental sort path
> >> >> >creation for this case to live alongside the explicit ordered path
> >> >> >creation in create_ordered_paths() rather than in
> >> >> >generate_gather_paths().
> >> >> >
> >> >>
> >> >> How would that solve the issue? Also, we're building a gather path, so
> >> >> I think generate_gather_paths() is the right place where to do it. And
> >> >> we're not changing the semantics of generate_gather_paths() - the result
> >> >> path should be sorted "correctly" with respect to sort_pathkeys.
> >> >
> >> >Does that imply what the explicit sort in create_ordered_paths() is in
> >> >the wrong spot?
> >> >
> >>
> >> I think those are essentially the right places where to do this sort of
> >> stuff. Maybe there's a better place, but I don't think those places are
> >> somehow wrong.
> >>
> >> >Or, to put it another way, do you think that both kinds of sorts
> >> >should be added in the same place? It seems confusing to me that
> >> >they'd be split between the two methods (unless I'm completely
> >> >misunderstanding how the two work).
> >> >
> >>
> >> The paths built in those two places were very different in one aspect:
> >>
> >> 1) generate_gather_paths only ever looked at pathkeys for the subpath, it
> >> never even looked at ordering expected by paths above it (or the query as
> >> a whole). Plain Gather ignores pathkeys entirely, Gather Merge only aims
> >> to maintain ordering of the different subpaths.
> >>
> >> 2) create_oredered_paths is meant to enforce ordering needed by upper
> >> parts of the plan - either by using a properly sorted path, or adding an
> >> explicit sort.
> >>
> >>
> >> We want to extend (1) to also look at ordering expected by the upper parts
> >> of the plan, and consider incremental sort if applicable. (2) already does
> >> that, and it already has the correct pathkeys to enforce.
> >
> >I guess I'm still not following. If (2) is responsible (currently) for
> >adding an explicit sort, why wouldn't adding an incremental sort be an
> >example of that responsibility? The subpath that either a Sort or
> >IncrementalSort is being added on top of (to then feed into the
> >GatherMerge) is the same in both cases right?
> >
> >Unless you're saying that the difference is that since we have a
> >partial ordering already for incremental sort then incremental sort
> >falls into the category of "maintaining" existing ordering of the
> >subpath?
> >
>
> Oh, I think I understand what you're saying. Essentially, we should not
> be making generate_gather_paths responsible for adding the incremental
> sort. Instead, we should be looking at places than are adding explicit
> sort (using create_sort_path) and also consider adding incremental sort.
>
> I definitely agree with the second half - we should look at all places
> that create explicit sorts and make them also consider incremental
> sorts. That makes sense.

Yep, exactly.

> But I'm not sure it'll address all cases - the problem is that those
> places add the explicit sort because they need sorted input. Gather
> Merge does not do that, it only preserves existing ordering of paths.
>
> So it's possible the path does not have an explicit sort on to, and
> gather merge will not know to add it. And once we have the gather merge
> in place, we can't push place "under" it.

That's the explanation I was missing; and it makes sense (to restate:
sometimes sorting is useful even when not required for correctness of
the user returned data).

> In fact, we already have code dealing with this "issue" for a special
> case - see gather_grouping_paths(). It generates plain gather merge
> paths, but then also considers building one with explicit sort. But it
> only does that for grouping paths (when it's clear we need to be looking
> at grouping_pathkeys), and there are generate_gather_paths() that don't
> have similar treatment.

I just find it humorous both of us were writing separate emails
mentioning that function at the same time.

> >> But looking at root->sort_pathkeys in (1) seems to be the wrong thing :-(
> >>
> >> The thing is, we don't have just sort_pathkeys, there's distinct_pathkeys
> >> and group_pathkeys too, so maybe we should be looking at those too?
> >
> >I don't know enough yet to answer, but I'd like to look at (in the
> >debugger) the subpaths considered in each function to try to get a
> >better understanding of why we don't try to explicitly sort the aggs
> >(which we know we can't sort yet) but do for incremental sort. I
> >assume that means a subpath has to be present in one but not the other
> >since they both use the same pathkey checking function.
> >
>
> I've been wondering if we have some other code that needs to consider
> interesting pathkeys "candidates" (instead of just getting the list
> interesting in that place). Because then we could look at that code and
> use it here ...
>
> And guess what - postgres_fdw needs to do pretty much exactly that, when
> building paths for remote relations. AFAIK we can't easily request all
> plans from the remote node and then look at their pathkeys (like we'd do
> with local node), so instead we deduce "interesting pathkeys" and then
> request best plans for those. And deducing "interesing" pathkeys is
> pretty much what get_useful_pathkeys_for_relation() is about.
>
> So I've copied this function (and two more, called from it), whacked it
> a bit until it removed (shakespeare-writing chimp comes to mind) and
> voila, it seems to be working. The errors you reported are gone, and the
> plans seems to be reasonable.
>
> Attached is a sequence of 4 patches:
>
>
> 0001-fix-pathkey-processing-in-generate_gather_paths.patch
> ----------------------------------------------------------
> This is the fixed version of my previous patch, with the stuff stolen
> from postgres_fdw.
>
>
> 0002-fix-costing-in-cost_incremental_sort.patch
> -----------------------------------------------
> This is the costing fix, I mentioned before.
>
>
> 0003-fix-explain-in-parallel-mode.patch
> ---------------------------------------
> Minor bug in explain, when incremental sort ends up being in the
> parallel part of the plan (missing newline on per-worker line)
>
>
> 0004-rework-where-incremental-sort-paths-are-created.patch
> ----------------------------------------------------------
> This undoes the generate_gather_paths() changes from 0001, and instead
> modifies a bunch of places that call create_sort_path() to also consider
> incremental sorts. There are a couple remaining, but those should not be
> relevant to the queries I've been looking at.
>
>
> Essentially, 0002 and 0003 are bugfixes. 0001 and 0004 are the two
> different aproaches to building incremental sort + gather merge.
>
> Now, consider this example:
>
>   create table t (a int, b int, c int);
>   insert into t select mod(i,100),mod(i,100),i from generate_series(1,10000000) s(i);
>   create index on t (a);
>   analyze t;
>   explain select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1;
>
> With 0001+0002+0003 pathes, I get a plan like this:
>
>                                                      QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=10375.39..10594.72 rows=1 width=16)
>    ->  Incremental Sort  (cost=10375.39..2203675.71 rows=10000 width=16)
>          Sort Key: a, b, (sum(c))
>          Presorted Key: a, b
>          ->  GroupAggregate  (cost=10156.07..2203225.71 rows=10000 width=16)
>                Group Key: a, b
>                ->  Gather Merge  (cost=10156.07..2128124.39 rows=10000175 width=12)
>                      Workers Planned: 2
>                      ->  Incremental Sort  (cost=9156.04..972856.05 rows=4166740 width=12)
>                            Sort Key: a, b
>                            Presorted Key: a
>                            ->  Parallel Index Scan using t_a_idx on t  (cost=0.43..417690.30 rows=4166740 width=12)
> (12 rows)
>
>
> and with 0004, I get this:
>
>                                               QUERY PLAN
> ------------------------------------------------------------------------------------------------------
>  Limit  (cost=20443.84..20665.32 rows=1 width=16)
>    ->  Incremental Sort  (cost=20443.84..2235250.05 rows=10000 width=16)
>          Sort Key: a, b, (sum(c))
>          Presorted Key: a, b
>          ->  GroupAggregate  (cost=20222.37..2234800.05 rows=10000 width=16)
>                Group Key: a, b
>                ->  Incremental Sort  (cost=20222.37..2159698.74 rows=10000175 width=12)
>                      Sort Key: a, b
>                      Presorted Key: a
>                      ->  Index Scan using t_a_idx on t  (cost=0.43..476024.65 rows=10000175 width=12)
> (10 rows)
>
> Notice that cost of the second plan is almost double the first one. That
> means 0004 does not even generate the first plan, i.e. there are cases
> where we don't try to add the explicit sort before passing the path to
> generate_gather_paths().
>
> And I think I know why is that - while gather_grouping_paths() tries to
> add explicit sort below the gather merge, there are other places that
> call generate_gather_paths() that don't do that. In this case it's
> probably apply_scanjoin_target_to_paths() which simply builds
>
>    parallel (seq|index) scan + gather merge
>
> and that's it. The problem is likely the same - the code does not know
> which pathkeys are "interesting" at that point. We probably need to
> teach planner to do this.

I had also noticed that that was an obvious place where
generate_gather_paths() was used but an explicit sort wasn't also
added separately, which makes me think the division of labor is
probably currently wrong regardless of the incremental sort patch.

Do you agree? Should we try to fix that (likely with your new
"interesting paths" version of generate_gather_paths()) first as a
prefix patch to adding incremental sort?

> FWIW tweaking all the create_sort_path() places to also consider adding
> incremental sort is a bit tedious and invasive, and it almost doubles
> the amount of repetitive code. It's OK for experiment like this, but we
> should try handling this in a nicer way (move to a separate function
> that does both, or something like that).

Agreed.

On Tue, Jul 9, 2019 at 8:11 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Tue, Jul 09, 2019 at 03:37:03AM +0200, Tomas Vondra wrote:
> >
> > ...
> >
> >Notice that cost of the second plan is almost double the first one. That
> >means 0004 does not even generate the first plan, i.e. there are cases
> >where we don't try to add the explicit sort before passing the path to
> >generate_gather_paths().
> >
> >And I think I know why is that - while gather_grouping_paths() tries to
> >add explicit sort below the gather merge, there are other places that
> >call generate_gather_paths() that don't do that. In this case it's
> >probably apply_scanjoin_target_to_paths() which simply builds
> >
> >  parallel (seq|index) scan + gather merge
> >
> >and that's it. The problem is likely the same - the code does not know
> >which pathkeys are "interesting" at that point. We probably need to
> >teach planner to do this.
> >
>
> I've looked into this again, and yes - that's the reason. I've added
> generate_useful_gather_paths() which is a wrapper on top of
> generate_gather_paths(). It essentially does what 0001 patch did directly
> in generate_gather_paths() so it's more like create_grouping_paths().
>
> And that does the trick - we now generate the cheaper paths, and I don't
> see any crashes in regression tests etc.
>
> I still suspect we already have code doing similar checks whether pathkeys
> might be useful somewhere. I've looked into pathkeys.c and elsewhere but
> no luck.
>
> Attached is a slightly modified patch series:
>
> 1) 0001 considers incremental sort paths in various places (adds the new
> generate_useful_gather_paths and modifies places calling create_sort_path)

I need to spend some decent time digesting this patch, but the concept
sounds very useful.

> 2) 0002 and 0003 are fixes I mentioned before
>
> 3) 0004 adds a new GUC force_incremental_sort that (when set to 'on')
> tries to nudge the optimizer into using incremental sort by essentially
> making it free (i.e. using startup/total costs of the subpath). I've found
> this useful when trying to force incremental sorts into plans where it may
> not be the best strategy.

That will be super helpful. I do wonder if we need to expose (in the
production patch) a GUC of some kind to adjust incremental sort
costing so that users can try to tweak when it is preferred over
regular sort.

> I won't have time to hack on this over the next ~2 weeks, but I'll try to
> respond to questions when possible.

Understood; thanks so much for your help on this.

> >
> >FWIW tweaking all the create_sort_path() places to also consider adding
> >incremental sort is a bit tedious and invasive, and it almost doubles
> >the amount of repetitive code. It's OK for experiment like this, but we
> >should try handling this in a nicer way (move to a separate function
> >that does both, or something like that).
> >
>
> This definitely needs more work. We need to refactor it in some way, e.g.
> have a function that would consider both explicit sort (on the cheapest
> path) and incremental sort (on all paths), and call it from all those
> places. Haven't tried it, though.
>
> There's also a couple more places where we do create_sort_path() and don't
> consider incremental sort yet - window functions, distinct etc.

Yep, and likely want regression tests for all of these cases also.

James Coleman



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: James Coleman
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)