Missing optimization when filters are applied after window functions - Mailing list pgsql-hackers
From | Volker Grabsch |
---|---|
Subject | Missing optimization when filters are applied after window functions |
Date | |
Msg-id | 20120516075017.GA31790@vlap Whole thread Raw |
Responses |
Re: Missing optimization when filters are applied after window functions
Re: Missing optimization when filters are applied after window functions |
List | pgsql-hackers |
Dear PostgreSQL hackers, [ please CC to me as I'm not subscribed to the list ] I think there is a missing optimization when a filter is applied after a window function, where the filtered field is also used for partitioning. Here a simplified example: Suppose we have a table that stores 100.000 ids, and 10 random values per id: create table test ( id integer, value double precision, primary key (id, value) ); insert into test (id, value) select generate_series(1, 100000) id, random() from generate_series(1, 10); We are now asking for the rank of each value per id (it doesn't matter if it's rank() or any other window function): select id, value, rank() over (partition by id order by value) from test; Now suppose we're interested only in the result for id 23000. If the query above was defined in a view, this would lead to the following query: select * from ( select id, value, rank() over (partitionby id order by value) from test ) subquery where id = 23000; This is very slow! The "explain" command shows the reason: Subquery Scan on subquery (cost=0.00..86556.80 rows=5000 width=20) Filter: (subquery.id = 23000) -> WindowAgg (cost=0.00..74056.80 rows=1000000 width=12) -> Index Scan using test_pkey on test (cost=0.00..56556.80rows=1000000 width=12) The Filter (id = 23000) is applied after WindowAgg, although it could certainly be applied before! Of course we could rewrite the above query like this: select id, value, rank() over (partition by id order by value) from test where id =23000; This would lead to a more sane plan where the filter is applied before WindowAgg, making use of the primary key: WindowAgg (cost=43.53..43.70 rows=10 width=12) -> Sort (cost=43.53..43.55 rows=10 width=12) Sort Key:value -> Bitmap Heap Scan on test (cost=4.53..43.36 rows=10 width=12) Recheck Cond: (id =23000) -> Bitmap Index Scan on test_pkey (cost=0.00..4.53 rows=10 width=0) IndexCond: (id = 23000) However, the second query cannot be cleanly split into a generic view and a specialized part anymore. In other words, it is currently impossible to use Window functions in a View without risking serious, unexpected performance issues, even in simple cases! I propose the following general optimization: If all window functions are partitioned by the same first field (here: id), then any filter on that field should be executed before WindowAgg. So a query like this: select ... from (select ... over (partition by FIELD, other2, ...) ... over (partitionby FIELD, other3, ...) ... over (partition by FIELD, ...) ... ) where SOME_FILTER(FIELD) Shoule be transformed into that: select ... from (select ... over (partition by FIELD, other2, ...) ... over (partitionby FIELD, other3, ...) ... over (partition by FIELD, ...) ... where SOME_FILTER(FIELD) ) I verfied that this issue exists with PostgreSQL 9.2 beta 1 as well as the PostgreSQL 9.1 release. Is there any work on this issue? Or am I wrong and this optimization should not be implemented for some reason? Regards, Volker -- Volker Grabsch ---<<(())>>---
pgsql-hackers by date: