Todo: Teach planner to evaluate multiple windows in the optimal order - Mailing list pgsql-hackers

From Ankit Kumar Pandey
Subject Todo: Teach planner to evaluate multiple windows in the optimal order
Date
Msg-id 83d80853-a45c-d85c-68eb-59acfe7fb5fb@gmail.com
Whole thread Raw
Responses Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers

Hi,

While looking at one of the todo item in Window function, namely:

Teach planner to evaluate multiple windows in the optimal order Currently windows are always evaluated in the query-specified order.

From threads, relevant points.

Point #1

In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
sorts. We also perform 3.
and Point #2
Teach planner to decide which window to evaluate first based on costs.
Currently the first window in the query is evaluated first, there may be no
index to help sort the first window, but perhaps there are for other windows
in the query. This may allow an index scan instead of a seqscan -> sort.
Repro:

select pg_catalog.version();

                                              version                                               
----------------------------------------------------------------------------------------------------
 PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0, 64-bit
(1 row)

create table empsalary(depname text, empno int, salary int);
insert into empsalary values (select substr(md5(random()::text), 0, 25), generate_series(1,10), generate_series(10000,12000));

explain SELECT depname, SUM(salary) OVER (ORDER BY salary), SUM(salary) OVER (ORDER BY empno) FROM empsalary ORDER BY salary;

                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 WindowAgg  (cost=289.47..324.48 rows=2001 width=49)
   ->  Sort  (cost=289.47..294.47 rows=2001 width=41)
         Sort Key: salary
         ->  WindowAgg  (cost=144.73..179.75 rows=2001 width=41)
               ->  Sort  (cost=144.73..149.73 rows=2001 width=33)
                     Sort Key: empno
                     ->  Seq Scan on empsalary  (cost=0.00..35.01 rows=2001 width=33)
(7 rows)

As it is seen, for case #1, issue looks like resolved and only 2 sorts are performed.

For #2, index col ordering is changed.

create index idx_emp on empsalary (empno);
explain SELECT depname, SUM(salary) OVER (ORDER BY salary), SUM(salary) OVER (ORDER BY empno) FROM empsalary ORDER BY salary;
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 WindowAgg  (cost=204.03..239.04 rows=2001 width=49)
   ->  Sort  (cost=204.03..209.03 rows=2001 width=41)
         Sort Key: salary
         ->  WindowAgg  (cost=0.28..94.31 rows=2001 width=41)
               ->  Index Scan using idx_emp on empsalary  (cost=0.28..64.29 rows=2001 width=33)
(5 rows)

explain SELECT depname, SUM(salary) OVER (ORDER BY empno), SUM(salary) OVER (ORDER BY salary) FROM empsalary ORDER BY salary;
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 WindowAgg  (cost=204.03..239.04 rows=2001 width=49)
   ->  Sort  (cost=204.03..209.03 rows=2001 width=41)
         Sort Key: salary
         ->  WindowAgg  (cost=0.28..94.31 rows=2001 width=41)
               ->  Index Scan using idx_emp on empsalary  (cost=0.28..64.29 rows=2001 width=33)
(5 rows)

In both cases, index scan is performed, which means this issue is resolved as well.

Is this todo still relevant?

Further down the threads:

I do think the patch has probably left some low-hanging fruit on the
simpler end of the difficulty spectrum, namely when the window stuff
requires only one ordering that could be done either explicitly or
by an indexscan. That choice should ideally be done with a proper
cost comparison taking any LIMIT into account. I think right now
the LIMIT might not be accounted for, or might be considered even
when it shouldn't be because another sort is needed anyway.

I am not sure if I understand this fully but does it means proposed todo (to use indexes) should be refined to

teach planner to  take into account of cost model while deciding to use index or not in window functions? Meaning not always go with index route (modify point #2)?

-- 
Regards,
Ankit Kumar Pandey

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: daitch_mokotoff module
Next
From: Ankit Kumar Pandey
Date:
Subject: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG