Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | CAApHDvp8BpUEo_kQdGHWNPCjcmRWCdiy5p26SoQA4R6rinkaLA@mail.gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (Ankit Kumar Pandey <itsankitkp@gmail.com>) |
Responses |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
|
List | pgsql-hackers |
(your email client still seems broken) On Sun, 8 Jan 2023 at 05:27, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote: > > > While writing test cases, I found that optimization do not happen for > case #1 > > (which is prime candidate for such operation) like > > EXPLAIN (COSTS OFF) > SELECT empno, > depname, > min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary, > sum(salary) OVER (PARTITION BY depname) depsalary > FROM empsalary > ORDER BY depname, empno, enroll_date > > This happens because mutual exclusiveness of two operands (when number > of window functions > 1) viz > > is_sorted and last activeWindow in the condition: > > ( !is_sorted && lnext(activeWindows, l) == NULL) > > For 2nd last window function, is_sorted is false and path keys get added. > > In next run (for last window function), is_sorted becomes true and whole > optimization > > part is skipped. > > Note: Major issue that if I remove is_sorted from condition, even though > > path keys are added, it still do not perform optimization and works same > as in master/unoptimized case. > > Perhaps adding path keys at last window function is not doing trick? > Maybe we need to add pathkeys > > to all window functions which are subset of query's order by > irrespective of being last or not? You might need to have another loop before the foreach loop that loops backwards through the WindowClauses and remembers the index of the WindowClause which has pathkeys contained in the query's ORDER BY pathkeys then apply the optimisation from that point in the main foreach loop. Also, if the condition within the foreach loop which checks when we want to apply this optimisation is going to be run > 1 time, then you should probably have boolean variable that's set before the loop which saves if we're going to try to apply the optimisation. That'll save from having to check things like if the query has a LIMIT clause multiple times. > Case #2: > > For presorted columns, eg > > CREATE INDEX depname_idx ON empsalary(depname); > SET enable_seqscan=0; > EXPLAIN (COSTS OFF) > SELECT empno, > min(salary) OVER (PARTITION BY depname) depminsalary > FROM empsalary > ORDER BY depname, empno; > > Is this correct plan: > > a) > > QUERY PLAN > ------------------------------------------------------- > Incremental Sort > Sort Key: depname, empno > Presorted Key: depname > -> WindowAgg > -> Index Scan using depname_idx on empsalary > (5 rows) > > or this: > > b) (Akin to Optimized version) > > QUERY PLAN > ------------------------------------------------------- > WindowAgg > -> Incremental Sort > Sort Key: depname, empno > Presorted Key: depname > -> Index Scan using depname_idx on empsalary > (5 rows) > > Patched version does (a) because of is_sorted condition. a) looks like the best plan to me. What's the point of pushing the sort below the WindowAgg in this case? The point of this optimisation is to reduce the number of sorts not to push them as deep into the plan as possible. We should only be pushing them down when it can reduce the number of sorts. There's no reduction in the number of sorts in the above plan. David
pgsql-hackers by date: