Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | Ankit Kumar Pandey |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | 398f39bf-54c0-e510-5507-6dab1b65d5e7@gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
On 03/01/23 08:21, David Rowley wrote: > On Mon, 26 Dec 2022 at 02:04, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: >> Point #1 >> >> In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3 >> sorts. We also perform 3. > This shouldn't be too hard to do. See the code in > select_active_windows(). You'll likely want to pay attention to the > DISTINCT pathkeys if they exist and just use the ORDER BY pathkeys if > the query has no DISTINCT clause. DISTINCT is evaluated after Window > and before ORDER BY. > > One idea to implement this would be to adjust the loop in > select_active_windows() so that we record any WindowClauses which have > the pathkeys contained in the ORDER BY / DISTINCT pathkeys then record > those separately and append those onto the end of the actives array > after the sort. > > I do think you'll likely want to put any WindowClauses which have > pathkeys which are a true subset or true superset of the ORDER BY / > DISTINCT pathkeys last. If they're a superset then we won't need to > perform any additional ordering for the DISTINCT / ORDER BY clause. > If they're a subset then we might be able to perform an Incremental > Sort, which is likely much cheaper than a full sort. The existing > code should handle that part. You just need to make > select_active_windows() more intelligent. > > You might also think that we could perform additional optimisations > and also adjust the ORDER BY clause of a WindowClause if it contains > the pathkeys of the DISTINCT / ORDER BY clause. For example: > > SELECT *,row_number() over (order by a,b) from tab order by a,b,c; > > However, if you were to adjust the WindowClauses ORDER BY to become > a,b,c then you could produce incorrect results for window functions > that change their result based on peer rows. > > Note the difference in results from: > > create table ab(a int, b int); > insert into ab select x,y from generate_series(1,5) x, generate_Series(1,5)y; > > select a,b,count(*) over (order by a) from ab order by a,b; > select a,b,count(*) over (order by a,b) from ab order by a,b; > Thanks, let me try this. >> and Point #2 >> >> Teach planner to decide which window to evaluate first based on costs. >> Currently the first window in the query is evaluated first, there may be no >> index to help sort the first window, but perhaps there are for other windows >> in the query. This may allow an index scan instead of a seqscan -> sort. > What Tom wrote about that in the first paragraph of [1] still applies. > The problem is that if the query contains many joins that to properly > find the cheapest way of executing the query we'd have to perform the > join search once for each unique sort order of each WindowClause. > That's just not practical to do from a performance standpoint. The > join search can be very expensive. There may be something that could > be done to better determine the most likely candidate for the first > WindowClause using some heuristics, but I've no idea what those would > be. You should look into point #1 first. Point #2 is significantly > more difficult to solve in a way that would be acceptable to the > project. > Okay, leaving this out for now. -- Regards, Ankit Kumar Pandey
pgsql-hackers by date: