Re: Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers
From | Ankit Kumar Pandey |
---|---|
Subject | Re: Todo: Teach planner to evaluate multiple windows in the optimal order |
Date | |
Msg-id | 2f9f632d-3297-09dd-1d01-af3e0961c446@gmail.com Whole thread Raw |
In response to | Re: Todo: Teach planner to evaluate multiple windows in the optimal order (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
|
List | pgsql-hackers |
> On 10/01/23 10:53, David Rowley wrote: > the total cost is the same for both of these, but the execution time > seems to vary quite a bit. This is really weird, I tried it different ways (to rule out any issues due to caching) and execution time varied in spite of having same cost. > Maybe we should try and do this for DISTINCT queries if the > distinct_pathkeys match the orderby_pathkeys. That seems a little less > copout-ish. If the ORDER BY is the same as the DISTINCT then it seems > likely that the ORDER BY might opt to use the Unique path for DISTINCT > since it'll already have the correct pathkeys. > However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and > then sort all over again, especially so when the DISTINCT removes a > large proportion of the rows. Isn't order by pathkeys are always fewer than distinct pathkeys? distinct pathkeys = order by pathkeys + window functions pathkeys Again, I got your point which that it is okay to pushdown order by clause if distinct is doing unique sort. But problem is (atleast from what I am facing), distinct is not caring about pushed down sortkeys, it goes with hashagg or unique with some other logic (mentioned below). Consider following (with distinct clause restriction removed) if (parse->distinctClause) { List* distinct_pathkeys = make_pathkeys_for_sortclauses(root, parse->distinctClause, root->processed_tlist); if (!compare_pathkeys(distinct_pathkeys, orderby_pathkeys)==1) // distinct key > order by key skip = true; // this is used to skip order by pushdown } CASE #1: explain (costs off) select distinct a,b, min(a) over (partition by a), sum (a) over (partition by a) from abcd order by a,b; QUERY PLAN ----------------------------------------------------------- Sort Sort Key: a, b -> HashAggregate Group Key: a, b, min(a) OVER (?), sum(a) OVER (?) -> WindowAgg -> Sort Sort Key: a, b -> Seq Scan on abcd (8 rows) explain (costs off) select distinct a,b,c, min(a) over (partition by a), sum (a) over (partition by a) from abcd order bya,b,c; QUERY PLAN -------------------------------------------------------------- Sort Sort Key: a, b, c -> HashAggregate Group Key: a, b, c, min(a) OVER (?), sum(a) OVER (?) -> WindowAgg -> Sort Sort Key: a, b, c -> Seq Scan on abcd (8 rows) No matter how many columns are pushed down, it does hashagg. On the other hand: CASE #2: EXPLAIN (costs off) SELECT DISTINCT depname, empno, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary) OVER(PARTITION BY depname) depsalary FROM empsalary ORDER BY depname, empno; QUERY PLAN ---------------------------------------------------------------------------------- Unique -> Sort Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)) -> WindowAgg -> Sort Sort Key: depname, empno -> Seq Scan on empsalary (7 rows) EXPLAIN (costs off) SELECT DISTINCT depname, empno, enroll_date, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary)OVER (PARTITION BY depname) depsalary FROM empsalary ORDER BY depname, empno, enroll_date; QUERY PLAN ----------------------------------------------------------------------------------------------- Unique -> Sort Sort Key: depname, empno, enroll_date, (min(salary) OVER (?)), (sum(salary) OVER (?)) -> WindowAgg -> Sort Sort Key: depname, empno, enroll_date -> Seq Scan on empsalary (7 rows) It keeps doing Unique. In both of the cases, compare_pathkeys(distinct_pathkeys, orderby_pathkeys) returns 1 Looking bit further, planner is choosing things correctly. I could see cost of unique being higher in 1st case and lower in 2nd case. But the point is, if sort for orderby is pushdown, shouldn't there be some discount on cost of Unique sort (so that there is more possibility of it being favorable compared to HashAgg in certain cases)? Again, cost of Unqiue node is taken as cost of sort node as it is, but for HashAgg, new cost is being computed. If we do incremental sort here (for unique node), as we have pushed down orderby's, unique cost could be reduced and our optimization could be made worthwhile (I assume this is what you intended here) in case of distinct. Eg: EXPLAIN SELECT DISTINCT depname, empno, enroll_date, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary) OVER(PARTITION BY depname) depsalary FROM empsalary ORDER BY depname, empno, enroll_date; QUERY PLAN ----------------------------------------------------------------------------------------------- Unique (cost=1.63..1.78 rows=10 width=56) -> Sort (cost=1.63..1.66 rows=10 width=56) Sort Key: depname, empno, enroll_date, (min(salary) OVER (?)), (sum(salary) OVER (?)) -> WindowAgg (cost=1.27..1.47 rows=10 width=56) -> Sort (cost=1.27..1.29 rows=10 width=48) Sort Key: depname, empno, enroll_date -> Seq Scan on empsalary (cost=0.00..1.10 rows=10 width=48) depname, empno, enroll_date are presorted but still strict sorting is done on all columns. Additionally, > the total cost is the same for both of these, but the execution time > seems to vary quite a bit. Even if I pushdown one or two path keys, end result is same cost (which isn't helping). -- Regards, Ankit Kumar Pandey
pgsql-hackers by date: