Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From James Coleman
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id CAAaqYe953TA0Pj+08jEBgK_=5zMCJO-vgYnh4n_Me71ZdzDJLg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Tue, Mar 31, 2020 at 11:07 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Mar 31, 2020 at 10:44 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Tue, Mar 31, 2020 at 10:12:29PM -0400, James Coleman wrote:
> > >On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra
> > ><tomas.vondra@2ndquadrant.com> wrote:
> > >>
> > >> On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:
> > >> >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
> > >> ><tomas.vondra@2ndquadrant.com> wrote:
> > >> >>
> > >> >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
> > >> >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> > >> >> ><tomas.vondra@2ndquadrant.com> wrote:
...
> > >> >> >One small idea, but I'm not yet sure it helps us a whole lot: if the
> > >> >> >query pathkeys is only length 1, then we could skip the additional
> > >> >> >path creation.
> > >> >> >
> > >> >>
> > >> >> I don't follow. Why would we create incremental sort in this case at
> > >> >> all? With single-element query_pathkeys the path is either unsorted or
> > >> >> fully sorted - there's no room for incremental sort. No?
> > >> >
> > >> >Well, we shouldn't, that's what I'm getting. But I didn't see anything
> > >> >in the code now that explicitly excludes that case when decided
> > >> >whether or not to create an incremental sort path, unless I'm missing
> > >> >something obvious.
> > >>
> > >> Well, my point is that create_ordered_paths() looks like this:
> > >>
> > >>      is_sorted = pathkeys_common_contained_in(root->sort_patkeys, ...);
> > >>
> > >>      if (is_sorted)
> > >>      {
> > >>          ... old code
> > >>      }
> > >>      else
> > >>      {
> > >>          if (input_path == cheapest_input_path)
> > >>          {
> > >>              ... old code
> > >>          }
> > >>
> > >>          /* With incremental sort disabled, don't build those paths. */
> > >>          if (!enable_incrementalsort)
> > >>              continue;
> > >>
> > >>          /* Likewise, if the path can't be used for incremental sort. */
> > >>          if (!presorted_keys)
> > >>              continue;
> > >>
> > >>          ... incremental sort path
> > >>      }
> > >>
> > >> Now, with single-item sort_pathkeys, the input path can't be partially
> > >> sorted. It's either fully sorted - in which case it's handled by the
> > >> first branch. Or it's not sorted at all, so presorted_keys==0 and we
> > >> never get to the incremental path.
> > >>
> > >> Or did you mean to use the optimization somewhere else?
> > >
> > >Hmm, yes, I didn't think through that properly. I'll have to look at
> > >the other cases to confirm the same logic applies there.

I looked through this more carefully, and I did end up finding a few
places where we can skip iterating through a list of paths entirely
with this check, so I added it there. I also cleaned up some comments,
added comments and asserts to the other places where
list_length(pathkeys) should be guaranteed to be > 1, removed a few
asserts I found unnecessary, and merged duplicative
pathkeys_[count_]_contained_in calls.

> > >One other thing:in the code above we create the regular sort path
> > >inside of `if (input_path == cheapest_input_path)`, but incremental
> > >sort is outside of that condition. I'm not sure I'm remembering why
> > >that was, and it's not obvious to me reading it right now (though it's
> > >getting late here, so maybe I'm just not thinking clearly). Do you
> > >happen to remember why that is?
> > >
> >
> > It's because for the regular sort, the path is either already sorted or
> > it requires a full sort. But full sort only makes sense on the cheapest
> > path, because we assume the additional sort cost is independent of the
> > input cost, essentially
> >
> >     cost(path + Sort) = cost(path) + cost(Sort)
> >
> > and it's always
> >
> >      cost(path) + cost(Sort) >= cost(cheapest path) + cost(Sort)
> >
> > and by checking for cheapest path we simply skip building all the paths
> > that we'd end up discarding anyway.
> >
> > With incremental sort we can't do this, the cost of the incremental sort
> > depends on how well presorted is the input path.

Thanks for the explanation. I've added a comment to that effect.

James

Attachment

pgsql-hackers by date:

Previous
From: Prabhat Sahu
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Justin Pryzby
Date:
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace onthe fly