Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers

From Ankit Kumar Pandey
Subject Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date
Msg-id 398f39bf-54c0-e510-5507-6dab1b65d5e7@gmail.com
Whole thread Raw
In response to Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On 03/01/23 08:21, David Rowley wrote:
> On Mon, 26 Dec 2022 at 02:04, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
>> Point #1
>>
>> In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
>> sorts. We also perform 3.
> This shouldn't be too hard to do. See the code in
> select_active_windows().  You'll likely want to pay attention to the
> DISTINCT pathkeys if they exist and just use the ORDER BY pathkeys if
> the query has no DISTINCT clause. DISTINCT is evaluated after Window
> and before ORDER BY.
>
> One idea to implement this would be to adjust the loop in
> select_active_windows() so that we record any WindowClauses which have
> the pathkeys contained in the ORDER BY / DISTINCT pathkeys then record
> those separately and append those onto the end of the actives array
> after the sort.
>
> I do think you'll likely want to put any WindowClauses which have
> pathkeys which are a true subset or true superset of the ORDER BY /
> DISTINCT pathkeys last.  If they're a superset then we won't need to
> perform any additional ordering for the DISTINCT / ORDER BY clause.
> If they're a subset then we might be able to perform an Incremental
> Sort, which is likely much cheaper than a full sort.  The existing
> code should handle that part. You just need to make
> select_active_windows() more intelligent.
>
> You might also think that we could perform additional optimisations
> and also adjust the ORDER BY clause of a WindowClause if it contains
> the pathkeys of the DISTINCT / ORDER BY clause. For example:
>
> SELECT *,row_number() over (order by a,b) from tab order by a,b,c;
>
> However, if you were to adjust the WindowClauses ORDER BY to become
> a,b,c then you could produce incorrect results for window functions
> that change their result based on peer rows.
>
> Note the difference in results from:
>
> create table ab(a int, b int);
> insert into ab select x,y from generate_series(1,5) x, generate_Series(1,5)y;
>
> select a,b,count(*) over (order by a) from ab order by a,b;
> select a,b,count(*) over (order by a,b) from ab order by a,b;
>
Thanks, let me try this.


>> and Point #2
>>
>> Teach planner to decide which window to evaluate first based on costs.
>> Currently the first window in the query is evaluated first, there may be no
>> index to help sort the first window, but perhaps there are for other windows
>> in the query. This may allow an index scan instead of a seqscan -> sort.
> What Tom wrote about that in the first paragraph of [1] still applies.
> The problem is that if the query contains many joins that to properly
> find the cheapest way of executing the query we'd have to perform the
> join search once for each unique sort order of each WindowClause.
> That's just not practical to do from a performance standpoint. The
> join search can be very expensive. There may be something that could
> be done to better determine the most likely candidate for the first
> WindowClause using some heuristics, but I've no idea what those would
> be. You should look into point #1 first. Point #2 is significantly
> more difficult to solve in a way that would be acceptable to the
> project.
>
Okay, leaving this out for now.

-- 
Regards,
Ankit Kumar Pandey




pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: Michael Paquier
Date:
Subject: Re: typos