Re: Windowing Function Patch Review -> Standard Conformance - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Windowing Function Patch Review -> Standard Conformance |
Date | |
Msg-id | 233A00BB1F264BB3B18EC9E89CC4CD14@amd64 Whole thread Raw |
In response to | Re: Windowing Function Patch Review -> Standard Conformance ("Hitoshi Harada" <umi.tanuki@gmail.com>) |
Responses |
Re: Windowing Function Patch Review -> Standard Conformance
|
List | pgsql-hackers |
Hitoshi Harada wrote: > I tested the spool performance with David's earlier bigtable: > > CREATE TABLE bigtable ( > id SERIAL NOT NULL PRIMARY KEY, > timestamp TIMESTAMP NOT NULL > ); > > -- about 383MB of data > INSERT INTO bigtable (timestamp) > SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL > FROM generate_series(1,10000000); > > CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp); > > VACUUM ANALYZE bigtable; > > sample=# SELECT COUNT(*) FROM bigtable; > count > ---------- > 10000000 > (1 row) > > sample=# SELECT LEAD(timestamp) OVER (ORDER BY id) FROM bigtable LIMIT 1; > lead > ---------------------------- > 2008-12-02 00:15:10.288461 > (1 row) > > sample=# EXPLAIN ANALYZE SELECT LEAD(timestamp) OVER (ORDER BY id) > FROM bigtable LIMIT 1; > > QUERY PLAN > > -------------------------------------------------------------------------- > -------------------- > --------------------------------------------------- > Limit (cost=0.00..0.04 rows=1 width=12) (actual time=0.038..0.039 > rows=1 loops=1) > -> Window (cost=0.00..386612.13 rows=10000000 width=12) (actual > time=0.036..0.036 rows=1 > loops=1) > -> Index Scan using bigtable_pkey on bigtable > (cost=0.00..286612.13 rows=10000000 w > idth=12) (actual time=0.018..0.021 rows=2 loops=1) > Total runtime: 0.071 ms > (4 rows) > > > shows quite good result. Great work. Amazing improvement! Old patch: david=# select timestamp,lag(timestamp,1) over (order by id) from bigtable order by id limit 1; timestamp | lag ----------------------------+-----2008-11-10 21:55:16.498458 | (1 row) Time: 105205.055 ms New patch: david=# select timestamp,lag(timestamp,1) over (order by id) from bigtable order by id limit 1; timestamp | lag ----------------------------+-----2008-12-04 22:05:22.687975 | (1 row) Time: 1.640 ms > > The following query works on my build: > > > SELECT depname,SUM(SUM(salary)) OVER (ORDER BY depname) FROM empsalary > GROUP > > BY depname; > > ERROR: variable not found in subplan target list > This works fine on my build now too. > > Now, I am thinking about refactoring around aggregate common code, and > renaming WFunc to WinFunc, which leads pg_proc.proiswfunc be > pg_proc.proiswinfunc and so on if no objections come. > I've spent last night and tonight trying to break the patch and I've not managed it. I spent 2 and a half hours on the train last night reading over the patch mainly for my own interest. I also went over the documentation and I have a few suggestions for improvement: My modifications to the lead and lag syntax can be improved. Currently the optional parameters make it look like DEFAULT can be specified without OFFSET. This is not the case: + <type>any, [integer], [any]</type> Should be: + <type>any [,integer [,any] ]</type> And: + lag(<replaceable class="parameter">value</replaceable>, [<replaceable + class="parameter">offset</replaceable>], [<replaceable + class="parameter">default</replaceable>]) Should be: + lag(<replaceable class="parameter">value</replaceable> [, <replaceable + class="parameter">offset</replaceable> [,<replaceable + class="parameter">default</replaceable>] ]) Same for LEAD() + <para> + After <literal>WHERE</> and <literal>GROUP BY</> process, + rows might be windowed table, using the <literal>WINDOW</> + clause. + </para> I think I know what you mean here. My re-write seems to have turned the sentence into a paragraph. Please tell me if I've assumed the meaning wrongly: "After the <literal>WHERE</>, <literal>GROUP BY</> and <literal>HAVING</> clauses one or more <literal>WINDOW</> clauses can be specified. This will allow window functions to be specified in the <literal>SELECT</> clause. These window functions can make use of the <literal>WINDOW</> clauses by making reference to the alias name of the window rather than explicitly specifying the properties of the window in each <literal>OVER</> clause." + <para> + Another expample shows different capability of window functions + from above. Small typo: example instead of eapample + <para> + The same window definitions can be named and put togather into one + definition using <xref linkend="queries-window">. Small typo: together instead of togather. + <para> + Window functions doesn't accept DISTINCT and ALL syntax, even though + the function is an aggregate function. + </para> Small grammar error: doesn't should be replaced with don't. Or perhaps change to: "Window functions, unlike normal aggregate functions, do not allow DISTINCT or ALL to be used within the function argument list." + Window functions are not placed in any of GROUP BY, HAVING and + WHERE clauses, which process values before any of the windows. If + there is need to qualify rows by the result of window functions, + whole of the query must be nested and append WHERE clause outer of + the current query. I think this one maybe needs an example to back it up. It's quite an important thing and I'm sure lots of people will need to do this. I'm not 100% happy with my new paragraph either but can't see how to word it any better. "Window functions cannot be used in the WHERE, GROUP BY or HAVING clauses of the query. If there is a need to filter rows, group results or filter rows after aggregation takes place (HAVING) then the query must be nested. The query should contain the window functions in the inner query and apply the additional clauses that contain the results from the window function in the outer query, such as: SELECT depname, empno, salary, enroll_date FROM (SELECT depname, empno, salary, enroll_date, ROW_NUMBER() OVER (PARTITIONBY depname ORDER BY salary,empno) AS pos FROM empsalary ) AS e WHERE pos = 1; In the above query the we're filtering and only showing the results from the inner query where the ROW_NUMBER() value is equal to 1." But of course the above query would be more simple using DISTINCT ON. Maybe there is a better example... My previous marathon getting the person in 2nd place might be better but that's introducing another previously unknown table to the manual. + * A window function is defined as a function matked as wfunc in pg_proc. By + * this mark, it means the function can handle window function APIs that allow + * it to access arbitrary random rows within the window. Small typo: marked instead of matked Maybe this fragment should read: "A window function is a function that can be used by the window function API. This allows access arbitrary random rows within the window. These functions are identified by the proiswfunc column in pg_proc." Or pg_proc.proiswfunc if you've renamed. David.
pgsql-hackers by date: