Re: TODO items for window functions - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: TODO items for window functions |
Date | |
Msg-id | 3CDAD71E9D70417290FCF66F0178D1E1@amd64 Whole thread Raw |
In response to | TODO items for window functions (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: TODO items for window functions
|
List | pgsql-hackers |
Tom Lane Wrote: > The core window-functions patch is now committed and ready for wider > testing. However, there are a number of unfinished items, at least > some of which I'd like to see addressed before 8.4 release. In rough > order of importance: > > * Support creation of user-defined window functions. I think this is > a "must have" for 8.4 --- we are not in the habit of building > nonextensible basic features. It doesn't seem that hard either. > I think all we need do is to allow "WINDOW" as an attribute keyword > in CREATE FUNCTION. Does anyone have an objection or a better idea? > > * Implement support for non-default window framing clauses. Most likely > it's too late to consider getting the whole feature done for 8.4, but > I wonder whether we could support the restricted case of allowing just > these two combinations > BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (which is default) > BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING > I said earlier that we didn't really need to address this for 8.4, > but my thinking was flawed. The case I think is really important > is to allow last_value() to do something useful, and you can hardly > argue that it's useful without an ORDER BY to define which row in the > partition is "last". > > * Investigate whether we should prohibit window functions in recursive > terms; check whether any of the committed prohibitions are unnecessary. > > * Look at tuplestore performance issues. The tuplestore_in_memory() > thing is just a band-aid, we ought to try to solve it properly. > tuplestore_advance seems like a weak spot as well. > > * Do we really need so much duplicated code between Agg and WindowAgg? Hitoshi and I did briefly talk about these two a few months back, but there were other priorities at the time. Unsure how difficult it is, maybe another one for a TODO, 8.4 or 8.5 I'm not sure: * Minimise sorts in a query such as: david=# explain SELECT depname, david-# SUM(salary) OVER (ORDER BY salary), david-# SUM(salary) OVER (ORDER BY empno) david-# FROM empsalary david-# ORDER BY salary; QUERY PLAN ---------------------------------------------------------------------------- ----------------Sort (cost=213.15..215.75 rows=1040 width=44) Sort Key: salary -> WindowAgg (cost=142.83..161.03 rows=1040width=44) -> Sort (cost=142.83..145.43 rows=1040 width=44) Sort Key: empno -> WindowAgg (cost=72.52..90.72 rows=1040 width=44) -> Sort (cost=72.52..75.12 rows=1040 width=44) Sort Key: salary -> Seq Scan on empsalary (cost=0.00..20.40 rows=1040 width=44) In the above case it would be more efficient to evaluate windows with the same order by clause as the final results last to eliminate the final sort. In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3 sorts. We also perform 3. Also perhaps more difficult? Maybe 8.5... * Teach planner to decide which window to evaluate first based on costs. Currently the first window in the query is evaluated first, there may be no index to help sort the first window, but perhaps there are for other windows in the query. This may allow an index scan instead of a seqscan -> sort. I Oracle 10g seems to have the above capability but Sybase seems evaluate the first window first. I've yet to look at DB2. This would stop performance critical queries from looking like: SELECT id,sum_value2,sum_value FROM ( SELECT id, SUM(value) OVER (ORDER BY idxcol) AS sum_value, SUM(value2) OVER (ORDER BY no_idxcol) AS sum_value2 FROM some_table ) t; -- Allow index scan by putting the indexed column as the 1st window When they could look like: SELECT id, SUM(value2) OVER (ORDER BY no_idxcol) AS sum_value2, SUM(value) OVER (ORDER BY idxcol) AS sum_value FROM some_table; -- Planner has the option to eval 2nd window first due to index. David.
pgsql-hackers by date: