Thread: Re: Window Functions patch v06
I'm afraid the patch was too huge, trying to send it again without attachment... I made up my mind to scratch former window functions and redesigned completely new execution model, based on the discussion with Heikki. Attached is the v06 against HEAD today. http://umitanuki.net/pgsql/wfv06/design.html The concrete design concept is noted in nodeWindow.c, quoted here: * Note that the solid concept of the window functions is "they can access* arbitrary rows within the frame as they want".As far as we keep the rule* of thumb, any kind of optimization is allowed. And I threw away an idea that window functions are derived from aggregates, then window functions are only simple single function, which have capability to arbitrary random row access. By this mean, aggregates are special subset of window functions so that aggregates are treated special case in nodeWindow.c, processed in eval_windowaggregate(), while normal window functions are in eval_windowfunction(). By giving random row access, not only able to integrate aggregate to window mechanism but we now have lead()/lag() support. Furthermore, I can see FRAME clause (i.e. moving frame aggregate/cumulative computation) can be added for 8.4 with this design though not implemented yet. Tom, I added a function to tuplestore.c, tuplestore_write_to_read_pointer(), that allows to copy write pointer position to the one of readptrs. This is required for my case because when the functions access randomly rows that the executor haven't read out will be fetched (i.e. last row of the frame). And I hope you see the nth_value() function require really arbitrary row access. P.S. As a volunteer, my spare time to work on this project is getting limited, as my daily work is getting busy for the end of the year (this is a Japanese traditional business sense -:). I don't know how much I can work for the next commit fest but I can't believe it if this feature would be missed in 8.4 *only* for my own tiny business clients! Regards, -- Hitoshi Harada
2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com> > > I'm afraid the patch was too huge, trying to send it again without attachment... > > I made up my mind to scratch former window functions and redesigned > completely new execution model, based on the discussion with Heikki. > Attached is the v06 against HEAD today. Small nit - I get this from the following query: postgres=# select a, sum(a) over (order by a) from generate_series(1,10) a;a | sum ----+----- 1 | 55 2 | 55 3 | 55 4 | 55 5 | 55 6 | 55 7 | 55 8 | 55 9 | 5510 | 55 (10 rows) From what I can tell of the spec, the 'sum' column should contain a running sum (ie 1,3,6 etc). You mention that window frames haven't been implemented, but it seems like this case should return an error rather than the wrong results... Thanks, Ian
I am drunk. I forgot cc to -hackers. The talk between me and Ian was like that. 2008/10/12 Hitoshi Harada <umi.tanuki@gmail.com>: > 2008/10/12 Ian Caulfield <ian.caulfield@gmail.com>: >> 2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>: >>> 2008/10/12 Ian Caulfield <ian.caulfield@gmail.com>: >>>> 2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com> >>>>> >>>>> I'm afraid the patch was too huge, trying to send it again without attachment... >>>>> >>>>> I made up my mind to scratch former window functions and redesigned >>>>> completely new execution model, based on the discussion with Heikki. >>>>> Attached is the v06 against HEAD today. >>>> >>>> Small nit - I get this from the following query: >>>> >>>> postgres=# select a, sum(a) over (order by a) from generate_series(1,10) a; >>>> a | sum >>>> ----+----- >>>> 1 | 55 >>>> 2 | 55 >>>> 3 | 55 >>>> 4 | 55 >>>> 5 | 55 >>>> 6 | 55 >>>> 7 | 55 >>>> 8 | 55 >>>> 9 | 55 >>>> 10 | 55 >>>> (10 rows) >>>> >>>> From what I can tell of the spec, the 'sum' column should contain a >>>> running sum (ie 1,3,6 etc). You mention that window frames haven't >>>> been implemented, but it seems like this case should return an error >>>> rather than the wrong results... >>>> >>>> Thanks, >>>> Ian >>>> >>> >>> Thanks for notice. >>> I didn't know that. Ordered aggregate has only rows until current row? >>> I guess I need read more spec. >> >> That's how I read it, the relevant part of the spec seems to be: >> >> 5) WD also defines for each row R of RTE the window frame WF of R, >> consisting of a collection of rows. WF >> is defined as follows. >> >> Case: >> a) If WD has no window framing clause, then >> >> Case: >> i) If the window ordering clause of WD is not present, then WF is the >> window partition of R. >> ii) Otherwise, WF consists of all rows of the partition of R that >> precede R or are peers of R in the >> window ordering of the window partition defined by the window ordering clause. >> >> Ian >> > > It seems you're right. I will fix it soon probably. > By this spec, some of the regression tests including nth_value() etc. > are wrong. Generally we hold only preceding rows in the frame when > ORDER BY is specified, not only aggregate case. > Thanks again. > > > Regards, > > > -- > Hitoshi Harada > -- Hitoshi Harada
2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>: > I am drunk. I forgot cc to -hackers. The talk between me and Ian was like that. > > 2008/10/12 Hitoshi Harada <umi.tanuki@gmail.com>: >> 2008/10/12 Ian Caulfield <ian.caulfield@gmail.com>: >>> 2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>: >>>> 2008/10/12 Ian Caulfield <ian.caulfield@gmail.com>: >>>>> 2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com> >>>>>> >>>>>> I'm afraid the patch was too huge, trying to send it again without attachment... >>>>>> >>>>>> I made up my mind to scratch former window functions and redesigned >>>>>> completely new execution model, based on the discussion with Heikki. >>>>>> Attached is the v06 against HEAD today. >>>>> >>>>> Small nit - I get this from the following query: >>>>> >>>>> postgres=# select a, sum(a) over (order by a) from generate_series(1,10) a; >>>>> a | sum >>>>> ----+----- >>>>> 1 | 55 >>>>> 2 | 55 >>>>> 3 | 55 >>>>> 4 | 55 >>>>> 5 | 55 >>>>> 6 | 55 >>>>> 7 | 55 >>>>> 8 | 55 >>>>> 9 | 55 >>>>> 10 | 55 >>>>> (10 rows) >>>>> >>>>> From what I can tell of the spec, the 'sum' column should contain a >>>>> running sum (ie 1,3,6 etc). You mention that window frames haven't >>>>> been implemented, but it seems like this case should return an error >>>>> rather than the wrong results... >>>>> >>>>> Thanks, >>>>> Ian >>>>> >>>> >>>> Thanks for notice. >>>> I didn't know that. Ordered aggregate has only rows until current row? >>>> I guess I need read more spec. >>> >>> That's how I read it, the relevant part of the spec seems to be: >>> >>> 5) WD also defines for each row R of RTE the window frame WF of R, >>> consisting of a collection of rows. WF >>> is defined as follows. >>> >>> Case: >>> a) If WD has no window framing clause, then >>> >>> Case: >>> i) If the window ordering clause of WD is not present, then WF is the >>> window partition of R. >>> ii) Otherwise, WF consists of all rows of the partition of R that >>> precede R or are peers of R in the >>> window ordering of the window partition defined by the window ordering clause. >>> >>> Ian >>> >> >> It seems you're right. I will fix it soon probably. >> By this spec, some of the regression tests including nth_value() etc. >> are wrong. Generally we hold only preceding rows in the frame when >> ORDER BY is specified, not only aggregate case. >> Thanks again. Doing a bit of poking around in the spec and the Oracle documentation, I think (but I'm not 100% sure) that the results returned were correct for the query: postgres=# select a, sum(a) over () from generate_series(1,10) a; ERROR: either PARTITION BY or ORDER BY must be specified in window clause Howver, someone who is better at parsing the spec than I am probably ought to check... Ian
I confirmed this on Oracle: select last_value(id) over (order by id) as last_id, id from foo; LAST_ID ID ------- -- 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 So when you specify ORDER BY clause on window definition, the frame always contains rows preceding from current row, not only on aggregate. > > Doing a bit of poking around in the spec and the Oracle documentation, > I think (but I'm not 100% sure) that the results returned were correct > for the query: > > postgres=# select a, sum(a) over () from generate_series(1,10) a; > ERROR: either PARTITION BY or ORDER BY must be specified in window clause The empty window definition is forbidden temporarily because I didn't know how to take it although there's no problem to execute it. I haven't found any description on spec yet, I know at least Oracle allows it. I can find how to do it with the new (window execution model) design, (and the design is suitable to fix it above,) but at first before going into trivial specs, I would like core hackers to review the model is better than before or not. Thank you for your cooperation. Regards, -- Hitoshi Harada
Hitoshi Harada wrote: >I made up my mind to scratch former window functions and redesigned >completely new execution model, based on the discussion with Heikki. >Attached is the v06 against HEAD today. >http://umitanuki.net/pgsql/wfv06/design.html First off, fantastic work! In my eyes this and WITH RECURSIVE are a big step for both Postgres and open source RBDMS'. Only, one small query with LEAD() and LAG() Going by http://www.wiscorp.com/sql200n.zip "The lead and lag functions each take three arguments, a <value expression> VE, an <exact numeric literal> OFFSET, and a <value expression> DEFAULT. For each row R within the window partition P of R defined by a window structure descriptor, the lag function returns the value of VE evaluated on a row that is OFFSET number of rows before R within P, and the lead function returns the value of VE evaluated on a row that is OFFSET number of rows after R within P. The value of DEFAULT is returned as the result if there is no row corresponding to the OFFSET number of rows before R within P (for the lag function) or after R within P (for the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be specified to indicate whether the rows within P for which VE evaluates to the null value are preserved or eliminated" So going by that: SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee; Would use 'None' for rows that would be out of the bounds of the window. The current patch only seems to accept 2 arguments. ERROR: function lag(character varying, integer, unknown) does not exist
2008/10/14 David Rowley <dgrowley@gmail.com>: > Hitoshi Harada wrote: >>I made up my mind to scratch former window functions and redesigned >>completely new execution model, based on the discussion with Heikki. >>Attached is the v06 against HEAD today. >>http://umitanuki.net/pgsql/wfv06/design.html > > First off, fantastic work! > > In my eyes this and WITH RECURSIVE are a big step for both Postgres and open > source RBDMS'. > > Only, one small query with LEAD() and LAG() > > Going by http://www.wiscorp.com/sql200n.zip > > "The lead and lag functions each take three arguments, a <value expression> > VE, an <exact numeric literal> > OFFSET, and a <value expression> DEFAULT. For each row R within the window > partition P of R defined by > a window structure descriptor, the lag function returns the value of VE > evaluated on a row that is OFFSET > number of rows before R within P, and the lead function returns the value of > VE evaluated on a row that is > OFFSET number of rows after R within P. The value of DEFAULT is returned as > the result if there is no row > corresponding to the OFFSET number of rows before R within P (for the lag > function) or after R within P (for > the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be > specified to indicate whether > the rows within P for which VE evaluates to the null value are preserved or > eliminated" > > So going by that: > SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee; > > Would use 'None' for rows that would be out of the bounds of the window. > > The current patch only seems to accept 2 arguments. > ERROR: function lag(character varying, integer, unknown) does not exist > > > Thanks for your feedback. I agree I need to work on that. Also from the spec, "RESPECT NULLS / IGNORE NULLS" may be specified but not supported yet. This syntax specification is out of the postgres general function call so I wonder if those functions are treated specially or not. Regards, -- Hitoshi Harada
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: > I agree I need to work on that. Also from the spec, "RESPECT NULLS / > IGNORE NULLS" may be specified but not supported yet. This syntax > specification is out of the postgres general function call so I wonder > if those functions are treated specially or not. Egad, the SQL committee has certainly been taken over by creeping COBOL-itis when it comes to inventing random new syntax ... regards, tom lane