Thread: Use incremental sort paths for window functions

Use incremental sort paths for window functions

From
David Rowley
Date:
Over on [1] someone was asking about chained window paths making use
of already partially sorted input.  (The thread is on -general, so I
guessed they're not using PG13.)

However, On checking PG13 to see if incremental sort would help their
case, I saw it didn't. Looking at the code I saw that
create_window_paths() and create_one_window_path() don't make any use
of incremental sort paths.

I quickly put together the attached. It's only about 15 mins of work,
but it seems worth looking at a bit more for some future commitfest.
Yeah, I'll need to add some tests as I see nothing failed by changing
this.

I'll just park this here until then so I don't forget.

David

Attachment

Re: Use incremental sort paths for window functions

From
Daniel Gustafsson
Date:
> On 8 Jul 2020, at 06:57, David Rowley <dgrowleyml@gmail.com> wrote:
>
> Over on [1] someone was asking about chained window paths making use
> of already partially sorted input.  (The thread is on -general, so I
> guessed they're not using PG13.)

The [1] reference wasn't qualified, do you remember which thread it was?

> However, On checking PG13 to see if incremental sort would help their
> case, I saw it didn't. Looking at the code I saw that
> create_window_paths() and create_one_window_path() don't make any use
> of incremental sort paths.

Commit 728202b63cdcd7f counteracts this optimization in part since it orders
the windows such that the longest common prefix is executed first to allow
subsequent windows to skip sorting entirely.

That being said, it's only in part and when the stars don't align with sub-
sequently shorter common prefixes then incremental sort can help.  A synthetic
unscientific test with three windows over 10M rows, where no common prefix
exists, shows consistent speedups (for worst cases) well past what can be
attributed to background noise.

> I quickly put together the attached. It's only about 15 mins of work,
> but it seems worth looking at a bit more for some future commitfest.
> Yeah, I'll need to add some tests as I see nothing failed by changing
> this.

A few comments on the patch: there is no check for enable_incremental_sort, and
it lacks tests (as already mentioned) for the resulting plan.

cheers ./daniel


Re: Use incremental sort paths for window functions

From
Tomas Vondra
Date:
On Wed, Jul 08, 2020 at 04:57:21PM +1200, David Rowley wrote:
>Over on [1] someone was asking about chained window paths making use
>of already partially sorted input.  (The thread is on -general, so I
>guessed they're not using PG13.)
>
>However, On checking PG13 to see if incremental sort would help their
>case, I saw it didn't. Looking at the code I saw that
>create_window_paths() and create_one_window_path() don't make any use
>of incremental sort paths.
>
>I quickly put together the attached. It's only about 15 mins of work,
>but it seems worth looking at a bit more for some future commitfest.
>Yeah, I'll need to add some tests as I see nothing failed by changing
>this.
>

Yeah, I'm sure there are a couple other places that might benefit from
incremental sort but were not included in the PG13 commit. The patch
seems correct - did it help in the reported thread? How much?

I suppose this might benefit from an optimization similar to the GROUP
BY reordering discussed in [1]. For example, with

    max(a) over (partition by b,c)

I think we could use index on (c) and consider incremental sort by c,b,
i.e. with the inverted pathkeys. But that's a completely independent
topic, I believe.

[1] https://www.postgresql.org/message-id/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru

>I'll just park this here until then so I don't forget.
>

OK, thanks for looking into this!

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Use incremental sort paths for window functions

From
David Rowley
Date:
On Tue, 15 Sep 2020 at 00:02, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 8 Jul 2020, at 06:57, David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > Over on [1] someone was asking about chained window paths making use
> > of already partially sorted input.  (The thread is on -general, so I
> > guessed they're not using PG13.)
>
> The [1] reference wasn't qualified, do you remember which thread it was?

That was sloppy of me.  It's
https://www.postgresql.org/message-id/CADd42iFZWwYNsXjEM_3HWK3QnfiCrMNmpOkZqyBQCabnVxOPtw%40mail.gmail.com

> > However, On checking PG13 to see if incremental sort would help their
> > case, I saw it didn't. Looking at the code I saw that
> > create_window_paths() and create_one_window_path() don't make any use
> > of incremental sort paths.
>
> Commit 728202b63cdcd7f counteracts this optimization in part since it orders
> the windows such that the longest common prefix is executed first to allow
> subsequent windows to skip sorting entirely.

This would have been clearer if I'd remembered to include the link to
the thread.  The thread talks about sorting requirements like c1, c3
then c1, c4. So it can make use of the common prefix and do
incremental sorts.

It sounds like you're talking about cases like: wfunc() over (order by
a), wfunc2() over (order by a,b). Where we can just sort on a,b and
have that order work for the first wfunc(). That's a good optimisation
but does not work for the above case.

> That being said, it's only in part and when the stars don't align with sub-
> sequently shorter common prefixes then incremental sort can help.  A synthetic
> unscientific test with three windows over 10M rows, where no common prefix
> exists, shows consistent speedups (for worst cases) well past what can be
> attributed to background noise.
>
> > I quickly put together the attached. It's only about 15 mins of work,
> > but it seems worth looking at a bit more for some future commitfest.
> > Yeah, I'll need to add some tests as I see nothing failed by changing
> > this.
>
> A few comments on the patch: there is no check for enable_incremental_sort, and
> it lacks tests (as already mentioned) for the resulting plan.

Yeah, it should be making sure enable_incremental_sort is on for sure.
I've attached another version with a few tests added too.

David

Attachment

Re: Use incremental sort paths for window functions

From
Daniel Gustafsson
Date:
> On 15 Sep 2020, at 01:17, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 15 Sep 2020 at 00:02, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 8 Jul 2020, at 06:57, David Rowley <dgrowleyml@gmail.com> wrote:
>>>
>>> Over on [1] someone was asking about chained window paths making use
>>> of already partially sorted input.  (The thread is on -general, so I
>>> guessed they're not using PG13.)
>>
>> The [1] reference wasn't qualified, do you remember which thread it was?
>
> That was sloppy of me.  It's
> https://www.postgresql.org/message-id/CADd42iFZWwYNsXjEM_3HWK3QnfiCrMNmpOkZqyBQCabnVxOPtw%40mail.gmail.com

Thanks!

>>> However, On checking PG13 to see if incremental sort would help their
>>> case, I saw it didn't. Looking at the code I saw that
>>> create_window_paths() and create_one_window_path() don't make any use
>>> of incremental sort paths.
>>
>> Commit 728202b63cdcd7f counteracts this optimization in part since it orders
>> the windows such that the longest common prefix is executed first to allow
>> subsequent windows to skip sorting entirely.
>
> This would have been clearer if I'd remembered to include the link to
> the thread.  The thread talks about sorting requirements like c1, c3
> then c1, c4. So it can make use of the common prefix and do
> incremental sorts.
>
> It sounds like you're talking about cases like: wfunc() over (order by
> a), wfunc2() over (order by a,b). Where we can just sort on a,b and
> have that order work for the first wfunc(). That's a good optimisation
> but does not work for the above case.

Right, the combination of these two optimizations will however work well
together for quite a few cases.

On that note, assume we have the below scenario:

    wfunc .. (order by a), .. (order by a,b), .. (order by a,b,c)

Currently the windows will be ordered such that a,b,c is sorted first, with a,b
and a not having to sort.  I wonder if there is a good heuristic to find cases
where sorting a, then a,b incrementally and finally a,b,c incrementally is
cheaper than a big sort of a,b,c?  If a,b,c would spill but subsequent
incremental sorts won't then perhaps that could be a case?  Not sure if it's
worth the planner time, just thinking out loud.

>> A few comments on the patch: there is no check for enable_incremental_sort, and
>> it lacks tests (as already mentioned) for the resulting plan.
>
> Yeah, it should be making sure enable_incremental_sort is on for sure.
> I've attached another version with a few tests added too.

No comments on this version, LGTM.

cheers ./daniel


Re: Use incremental sort paths for window functions

From
David Rowley
Date:
On Tue, 15 Sep 2020 at 05:19, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Wed, Jul 08, 2020 at 04:57:21PM +1200, David Rowley wrote:
> >Over on [1] someone was asking about chained window paths making use
> >of already partially sorted input.  (The thread is on -general, so I
> >guessed they're not using PG13.)
> >
> >However, On checking PG13 to see if incremental sort would help their
> >case, I saw it didn't. Looking at the code I saw that
> >create_window_paths() and create_one_window_path() don't make any use
> >of incremental sort paths.
> >
> >I quickly put together the attached. It's only about 15 mins of work,
> >but it seems worth looking at a bit more for some future commitfest.
> >Yeah, I'll need to add some tests as I see nothing failed by changing
> >this.
> >
>
> Yeah, I'm sure there are a couple other places that might benefit from
> incremental sort but were not included in the PG13 commit. The patch
> seems correct - did it help in the reported thread? How much?

Looks like I didn't mention the idea on the thread.  I must have felt
it was just too many steps away from being very useful to mention it
in the -general thread.

I suppose it'll help similar to any use case for incremental sort;
lots in some and less so in others. It'll mostly depend on how big
each incremental sort is.  e.g order by a,b when there's only an index
on (a) will be pretty good if a is unique.  Each sort will be over
quite fast. If there are a million rows for each value of a then
incremental sort would be less favourable

> I suppose this might benefit from an optimization similar to the GROUP
> BY reordering discussed in [1]. For example, with
>
>     max(a) over (partition by b,c)
>
> I think we could use index on (c) and consider incremental sort by c,b,
> i.e. with the inverted pathkeys. But that's a completely independent
> topic, I believe.

I've only vaguely followed that. Sounds like interesting work, but I
agree that it's not related to this.

Thanks for having a look at this.

David



Re: Use incremental sort paths for window functions

From
David Rowley
Date:
On Tue, 15 Sep 2020 at 20:12, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> On that note, assume we have the below scenario:
>
>     wfunc .. (order by a), .. (order by a,b), .. (order by a,b,c)
>
> Currently the windows will be ordered such that a,b,c is sorted first, with a,b
> and a not having to sort.  I wonder if there is a good heuristic to find cases
> where sorting a, then a,b incrementally and finally a,b,c incrementally is
> cheaper than a big sort of a,b,c?  If a,b,c would spill but subsequent
> incremental sorts won't then perhaps that could be a case?  Not sure if it's
> worth the planner time, just thinking out loud.

It's a worthy cause, but unfortunately, I don't think there's any very
realistic thing that can be done about that.  The problem is that
you're deciding the "most sorted" window clause and putting that first
in the parameters to the query_planner()'s callback function.  If you
wanted to try some alternative orders then it means calling
query_planner() again with some other order for
qp_extra.activeWindows.

Perhaps there's some other way of doing it so that the planner does
some sort of preliminary investigation about the best order to
evaluate the windows in.  Currently, standard_qp_callback just takes
the first window and has the planner perform the join order search
based on that.  Performing the join order search multiple times is
just not realistic, so it could only be done by some sort of
pre-checks.  e.g, is there an index that's likely to help me obtain
this specific order.  Then we'd just have to hope that through the
join search that the planner actually managed to produce a more
optimal plan than it would have if we'd left the window evaluation
order alone. It sounds pretty tricky to make cheap and good enough at
the same time.

> No comments on this version, LGTM.

Cool. Many thanks for having a look.

David



Re: Use incremental sort paths for window functions

From
David Rowley
Date:
On Tue, 15 Sep 2020 at 23:21, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 15 Sep 2020 at 20:12, Daniel Gustafsson <daniel@yesql.se> wrote:
> >
> > No comments on this version, LGTM.
>
> Cool. Many thanks for having a look.

Pushed. 62e221e1c

David