Thread: Use incremental sort paths for window functions
Over on [1] someone was asking about chained window paths making use of already partially sorted input. (The thread is on -general, so I guessed they're not using PG13.) However, On checking PG13 to see if incremental sort would help their case, I saw it didn't. Looking at the code I saw that create_window_paths() and create_one_window_path() don't make any use of incremental sort paths. I quickly put together the attached. It's only about 15 mins of work, but it seems worth looking at a bit more for some future commitfest. Yeah, I'll need to add some tests as I see nothing failed by changing this. I'll just park this here until then so I don't forget. David
Attachment
> On 8 Jul 2020, at 06:57, David Rowley <dgrowleyml@gmail.com> wrote: > > Over on [1] someone was asking about chained window paths making use > of already partially sorted input. (The thread is on -general, so I > guessed they're not using PG13.) The [1] reference wasn't qualified, do you remember which thread it was? > However, On checking PG13 to see if incremental sort would help their > case, I saw it didn't. Looking at the code I saw that > create_window_paths() and create_one_window_path() don't make any use > of incremental sort paths. Commit 728202b63cdcd7f counteracts this optimization in part since it orders the windows such that the longest common prefix is executed first to allow subsequent windows to skip sorting entirely. That being said, it's only in part and when the stars don't align with sub- sequently shorter common prefixes then incremental sort can help. A synthetic unscientific test with three windows over 10M rows, where no common prefix exists, shows consistent speedups (for worst cases) well past what can be attributed to background noise. > I quickly put together the attached. It's only about 15 mins of work, > but it seems worth looking at a bit more for some future commitfest. > Yeah, I'll need to add some tests as I see nothing failed by changing > this. A few comments on the patch: there is no check for enable_incremental_sort, and it lacks tests (as already mentioned) for the resulting plan. cheers ./daniel
On Wed, Jul 08, 2020 at 04:57:21PM +1200, David Rowley wrote: >Over on [1] someone was asking about chained window paths making use >of already partially sorted input. (The thread is on -general, so I >guessed they're not using PG13.) > >However, On checking PG13 to see if incremental sort would help their >case, I saw it didn't. Looking at the code I saw that >create_window_paths() and create_one_window_path() don't make any use >of incremental sort paths. > >I quickly put together the attached. It's only about 15 mins of work, >but it seems worth looking at a bit more for some future commitfest. >Yeah, I'll need to add some tests as I see nothing failed by changing >this. > Yeah, I'm sure there are a couple other places that might benefit from incremental sort but were not included in the PG13 commit. The patch seems correct - did it help in the reported thread? How much? I suppose this might benefit from an optimization similar to the GROUP BY reordering discussed in [1]. For example, with max(a) over (partition by b,c) I think we could use index on (c) and consider incremental sort by c,b, i.e. with the inverted pathkeys. But that's a completely independent topic, I believe. [1] https://www.postgresql.org/message-id/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru >I'll just park this here until then so I don't forget. > OK, thanks for looking into this! regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 15 Sep 2020 at 00:02, Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 8 Jul 2020, at 06:57, David Rowley <dgrowleyml@gmail.com> wrote: > > > > Over on [1] someone was asking about chained window paths making use > > of already partially sorted input. (The thread is on -general, so I > > guessed they're not using PG13.) > > The [1] reference wasn't qualified, do you remember which thread it was? That was sloppy of me. It's https://www.postgresql.org/message-id/CADd42iFZWwYNsXjEM_3HWK3QnfiCrMNmpOkZqyBQCabnVxOPtw%40mail.gmail.com > > However, On checking PG13 to see if incremental sort would help their > > case, I saw it didn't. Looking at the code I saw that > > create_window_paths() and create_one_window_path() don't make any use > > of incremental sort paths. > > Commit 728202b63cdcd7f counteracts this optimization in part since it orders > the windows such that the longest common prefix is executed first to allow > subsequent windows to skip sorting entirely. This would have been clearer if I'd remembered to include the link to the thread. The thread talks about sorting requirements like c1, c3 then c1, c4. So it can make use of the common prefix and do incremental sorts. It sounds like you're talking about cases like: wfunc() over (order by a), wfunc2() over (order by a,b). Where we can just sort on a,b and have that order work for the first wfunc(). That's a good optimisation but does not work for the above case. > That being said, it's only in part and when the stars don't align with sub- > sequently shorter common prefixes then incremental sort can help. A synthetic > unscientific test with three windows over 10M rows, where no common prefix > exists, shows consistent speedups (for worst cases) well past what can be > attributed to background noise. > > > I quickly put together the attached. It's only about 15 mins of work, > > but it seems worth looking at a bit more for some future commitfest. > > Yeah, I'll need to add some tests as I see nothing failed by changing > > this. > > A few comments on the patch: there is no check for enable_incremental_sort, and > it lacks tests (as already mentioned) for the resulting plan. Yeah, it should be making sure enable_incremental_sort is on for sure. I've attached another version with a few tests added too. David
Attachment
> On 15 Sep 2020, at 01:17, David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 15 Sep 2020 at 00:02, Daniel Gustafsson <daniel@yesql.se> wrote: >> >>> On 8 Jul 2020, at 06:57, David Rowley <dgrowleyml@gmail.com> wrote: >>> >>> Over on [1] someone was asking about chained window paths making use >>> of already partially sorted input. (The thread is on -general, so I >>> guessed they're not using PG13.) >> >> The [1] reference wasn't qualified, do you remember which thread it was? > > That was sloppy of me. It's > https://www.postgresql.org/message-id/CADd42iFZWwYNsXjEM_3HWK3QnfiCrMNmpOkZqyBQCabnVxOPtw%40mail.gmail.com Thanks! >>> However, On checking PG13 to see if incremental sort would help their >>> case, I saw it didn't. Looking at the code I saw that >>> create_window_paths() and create_one_window_path() don't make any use >>> of incremental sort paths. >> >> Commit 728202b63cdcd7f counteracts this optimization in part since it orders >> the windows such that the longest common prefix is executed first to allow >> subsequent windows to skip sorting entirely. > > This would have been clearer if I'd remembered to include the link to > the thread. The thread talks about sorting requirements like c1, c3 > then c1, c4. So it can make use of the common prefix and do > incremental sorts. > > It sounds like you're talking about cases like: wfunc() over (order by > a), wfunc2() over (order by a,b). Where we can just sort on a,b and > have that order work for the first wfunc(). That's a good optimisation > but does not work for the above case. Right, the combination of these two optimizations will however work well together for quite a few cases. On that note, assume we have the below scenario: wfunc .. (order by a), .. (order by a,b), .. (order by a,b,c) Currently the windows will be ordered such that a,b,c is sorted first, with a,b and a not having to sort. I wonder if there is a good heuristic to find cases where sorting a, then a,b incrementally and finally a,b,c incrementally is cheaper than a big sort of a,b,c? If a,b,c would spill but subsequent incremental sorts won't then perhaps that could be a case? Not sure if it's worth the planner time, just thinking out loud. >> A few comments on the patch: there is no check for enable_incremental_sort, and >> it lacks tests (as already mentioned) for the resulting plan. > > Yeah, it should be making sure enable_incremental_sort is on for sure. > I've attached another version with a few tests added too. No comments on this version, LGTM. cheers ./daniel
On Tue, 15 Sep 2020 at 05:19, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Wed, Jul 08, 2020 at 04:57:21PM +1200, David Rowley wrote: > >Over on [1] someone was asking about chained window paths making use > >of already partially sorted input. (The thread is on -general, so I > >guessed they're not using PG13.) > > > >However, On checking PG13 to see if incremental sort would help their > >case, I saw it didn't. Looking at the code I saw that > >create_window_paths() and create_one_window_path() don't make any use > >of incremental sort paths. > > > >I quickly put together the attached. It's only about 15 mins of work, > >but it seems worth looking at a bit more for some future commitfest. > >Yeah, I'll need to add some tests as I see nothing failed by changing > >this. > > > > Yeah, I'm sure there are a couple other places that might benefit from > incremental sort but were not included in the PG13 commit. The patch > seems correct - did it help in the reported thread? How much? Looks like I didn't mention the idea on the thread. I must have felt it was just too many steps away from being very useful to mention it in the -general thread. I suppose it'll help similar to any use case for incremental sort; lots in some and less so in others. It'll mostly depend on how big each incremental sort is. e.g order by a,b when there's only an index on (a) will be pretty good if a is unique. Each sort will be over quite fast. If there are a million rows for each value of a then incremental sort would be less favourable > I suppose this might benefit from an optimization similar to the GROUP > BY reordering discussed in [1]. For example, with > > max(a) over (partition by b,c) > > I think we could use index on (c) and consider incremental sort by c,b, > i.e. with the inverted pathkeys. But that's a completely independent > topic, I believe. I've only vaguely followed that. Sounds like interesting work, but I agree that it's not related to this. Thanks for having a look at this. David
On Tue, 15 Sep 2020 at 20:12, Daniel Gustafsson <daniel@yesql.se> wrote: > > On that note, assume we have the below scenario: > > wfunc .. (order by a), .. (order by a,b), .. (order by a,b,c) > > Currently the windows will be ordered such that a,b,c is sorted first, with a,b > and a not having to sort. I wonder if there is a good heuristic to find cases > where sorting a, then a,b incrementally and finally a,b,c incrementally is > cheaper than a big sort of a,b,c? If a,b,c would spill but subsequent > incremental sorts won't then perhaps that could be a case? Not sure if it's > worth the planner time, just thinking out loud. It's a worthy cause, but unfortunately, I don't think there's any very realistic thing that can be done about that. The problem is that you're deciding the "most sorted" window clause and putting that first in the parameters to the query_planner()'s callback function. If you wanted to try some alternative orders then it means calling query_planner() again with some other order for qp_extra.activeWindows. Perhaps there's some other way of doing it so that the planner does some sort of preliminary investigation about the best order to evaluate the windows in. Currently, standard_qp_callback just takes the first window and has the planner perform the join order search based on that. Performing the join order search multiple times is just not realistic, so it could only be done by some sort of pre-checks. e.g, is there an index that's likely to help me obtain this specific order. Then we'd just have to hope that through the join search that the planner actually managed to produce a more optimal plan than it would have if we'd left the window evaluation order alone. It sounds pretty tricky to make cheap and good enough at the same time. > No comments on this version, LGTM. Cool. Many thanks for having a look. David
On Tue, 15 Sep 2020 at 23:21, David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 15 Sep 2020 at 20:12, Daniel Gustafsson <daniel@yesql.se> wrote: > > > > No comments on this version, LGTM. > > Cool. Many thanks for having a look. Pushed. 62e221e1c David