Thread: Re: Window Functions patch v06

Re: Window Functions patch v06

From
"Hitoshi Harada"
Date:
I'm afraid the patch was too huge, trying to send it again without attachment...

I made up my mind to scratch former window functions and redesigned
completely new execution model, based on the discussion with Heikki.
Attached is the v06 against HEAD today.

http://umitanuki.net/pgsql/wfv06/design.html

The concrete design concept is noted in nodeWindow.c, quoted here:
* Note that the solid concept of the window functions is "they can access* arbitrary rows within the frame as they
want".As far as we keep the rule* of thumb, any kind of optimization is allowed.
 

And I threw away an idea that window functions are derived from
aggregates, then window functions are only simple single function,
which have capability to arbitrary random row access. By this mean,
aggregates are special subset of window functions so that aggregates
are treated special case in nodeWindow.c, processed in
eval_windowaggregate(), while normal window functions are in
eval_windowfunction().

By giving random row access, not only able to integrate aggregate to
window mechanism but we now have lead()/lag() support. Furthermore, I
can see FRAME clause (i.e. moving frame aggregate/cumulative
computation) can be added for 8.4 with this design though not
implemented yet.

Tom,
I added a function to tuplestore.c,
tuplestore_write_to_read_pointer(), that allows to copy write pointer
position to the one of readptrs. This is required for my case because
when the functions access randomly rows that the executor haven't read
out will be fetched (i.e. last row of the frame). And I hope you see
the nth_value() function require really arbitrary row access.

P.S.
As a volunteer, my spare time to work on this project is getting
limited, as my daily work is getting busy for the end of the year
(this is a Japanese traditional business sense -:). I don't know how
much I can work for the next commit fest but I can't believe it if
this feature would be missed in 8.4 *only* for my own tiny business
clients!

Regards,



-- 
Hitoshi Harada


Re: Window Functions patch v06

From
"Ian Caulfield"
Date:
2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>
>
> I'm afraid the patch was too huge, trying to send it again without attachment...
>
> I made up my mind to scratch former window functions and redesigned
> completely new execution model, based on the discussion with Heikki.
> Attached is the v06 against HEAD today.

Small nit - I get this from the following query:

postgres=# select a, sum(a) over (order by a) from generate_series(1,10) a;a  | sum
----+----- 1 |  55 2 |  55 3 |  55 4 |  55 5 |  55 6 |  55 7 |  55 8 |  55 9 |  5510 |  55
(10 rows)

From what I can tell of the spec, the 'sum' column should contain a
running sum (ie 1,3,6 etc). You mention that window frames haven't
been implemented, but it seems like this case should return an error
rather than the wrong results...

Thanks,
Ian


Re: Window Functions patch v06

From
"Hitoshi Harada"
Date:
I am drunk. I forgot cc to -hackers. The talk between me and Ian was like that.

2008/10/12 Hitoshi Harada <umi.tanuki@gmail.com>:
> 2008/10/12 Ian Caulfield <ian.caulfield@gmail.com>:
>> 2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>:
>>> 2008/10/12 Ian Caulfield <ian.caulfield@gmail.com>:
>>>> 2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>
>>>>>
>>>>> I'm afraid the patch was too huge, trying to send it again without attachment...
>>>>>
>>>>> I made up my mind to scratch former window functions and redesigned
>>>>> completely new execution model, based on the discussion with Heikki.
>>>>> Attached is the v06 against HEAD today.
>>>>
>>>> Small nit - I get this from the following query:
>>>>
>>>> postgres=# select a, sum(a) over (order by a) from generate_series(1,10) a;
>>>>  a  | sum
>>>> ----+-----
>>>>  1 |  55
>>>>  2 |  55
>>>>  3 |  55
>>>>  4 |  55
>>>>  5 |  55
>>>>  6 |  55
>>>>  7 |  55
>>>>  8 |  55
>>>>  9 |  55
>>>>  10 |  55
>>>> (10 rows)
>>>>
>>>> From what I can tell of the spec, the 'sum' column should contain a
>>>> running sum (ie 1,3,6 etc). You mention that window frames haven't
>>>> been implemented, but it seems like this case should return an error
>>>> rather than the wrong results...
>>>>
>>>> Thanks,
>>>> Ian
>>>>
>>>
>>> Thanks for notice.
>>> I didn't know that. Ordered aggregate has only rows until current row?
>>> I guess I need read more spec.
>>
>> That's how I read it, the relevant part of the spec seems to be:
>>
>> 5) WD also defines for each row R of RTE the window frame WF of R,
>> consisting of a collection of rows. WF
>> is defined as follows.
>>
>> Case:
>> a) If WD has no window framing clause, then
>>
>> Case:
>> i) If the window ordering clause of WD is not present, then WF is the
>> window partition of R.
>> ii) Otherwise, WF consists of all rows of the partition of R that
>> precede R or are peers of R in the
>> window ordering of the window partition defined by the window ordering clause.
>>
>> Ian
>>
>
> It seems you're right. I will fix it soon probably.
> By this spec, some of the regression tests including nth_value() etc.
> are wrong. Generally we hold only preceding rows in the frame when
> ORDER BY is specified, not only aggregate case.
> Thanks again.
>
>
> Regards,
>
>
> --
> Hitoshi Harada
>



-- 
Hitoshi Harada


Re: Window Functions patch v06

From
"Ian Caulfield"
Date:
2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>:
> I am drunk. I forgot cc to -hackers. The talk between me and Ian was like that.
>
> 2008/10/12 Hitoshi Harada <umi.tanuki@gmail.com>:
>> 2008/10/12 Ian Caulfield <ian.caulfield@gmail.com>:
>>> 2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>:
>>>> 2008/10/12 Ian Caulfield <ian.caulfield@gmail.com>:
>>>>> 2008/10/11 Hitoshi Harada <umi.tanuki@gmail.com>
>>>>>>
>>>>>> I'm afraid the patch was too huge, trying to send it again without attachment...
>>>>>>
>>>>>> I made up my mind to scratch former window functions and redesigned
>>>>>> completely new execution model, based on the discussion with Heikki.
>>>>>> Attached is the v06 against HEAD today.
>>>>>
>>>>> Small nit - I get this from the following query:
>>>>>
>>>>> postgres=# select a, sum(a) over (order by a) from generate_series(1,10) a;
>>>>>  a  | sum
>>>>> ----+-----
>>>>>  1 |  55
>>>>>  2 |  55
>>>>>  3 |  55
>>>>>  4 |  55
>>>>>  5 |  55
>>>>>  6 |  55
>>>>>  7 |  55
>>>>>  8 |  55
>>>>>  9 |  55
>>>>>  10 |  55
>>>>> (10 rows)
>>>>>
>>>>> From what I can tell of the spec, the 'sum' column should contain a
>>>>> running sum (ie 1,3,6 etc). You mention that window frames haven't
>>>>> been implemented, but it seems like this case should return an error
>>>>> rather than the wrong results...
>>>>>
>>>>> Thanks,
>>>>> Ian
>>>>>
>>>>
>>>> Thanks for notice.
>>>> I didn't know that. Ordered aggregate has only rows until current row?
>>>> I guess I need read more spec.
>>>
>>> That's how I read it, the relevant part of the spec seems to be:
>>>
>>> 5) WD also defines for each row R of RTE the window frame WF of R,
>>> consisting of a collection of rows. WF
>>> is defined as follows.
>>>
>>> Case:
>>> a) If WD has no window framing clause, then
>>>
>>> Case:
>>> i) If the window ordering clause of WD is not present, then WF is the
>>> window partition of R.
>>> ii) Otherwise, WF consists of all rows of the partition of R that
>>> precede R or are peers of R in the
>>> window ordering of the window partition defined by the window ordering clause.
>>>
>>> Ian
>>>
>>
>> It seems you're right. I will fix it soon probably.
>> By this spec, some of the regression tests including nth_value() etc.
>> are wrong. Generally we hold only preceding rows in the frame when
>> ORDER BY is specified, not only aggregate case.
>> Thanks again.

Doing a bit of poking around in the spec and the Oracle documentation,
I think (but I'm not 100% sure) that the results returned were correct
for the query:

postgres=# select a, sum(a) over () from generate_series(1,10) a;
ERROR:  either PARTITION BY or ORDER BY must be specified in window clause

Howver, someone who is better at parsing the spec than I am probably
ought to check...

Ian


Re: Window Functions patch v06

From
"Hitoshi Harada"
Date:
I confirmed this on Oracle:

select last_value(id) over (order by id) as last_id, id from foo;

LAST_ID ID
------- --
0       0
1       1
2       2
3       3
4       4
5       5
6       6
7       7
8       8
9       9
10      10

So when you specify ORDER BY clause on window definition, the frame
always contains rows preceding from current row, not only on
aggregate.


>
> Doing a bit of poking around in the spec and the Oracle documentation,
> I think (but I'm not 100% sure) that the results returned were correct
> for the query:
>
> postgres=# select a, sum(a) over () from generate_series(1,10) a;
> ERROR:  either PARTITION BY or ORDER BY must be specified in window clause

The empty window definition is forbidden temporarily because I didn't
know how to take it although there's no problem to execute it. I
haven't found any description on spec yet, I know at least Oracle
allows it.

I can find how to do it with the new (window execution model) design,
(and the design is suitable to fix it above,) but at first before
going into trivial specs, I would like core hackers to review the
model is better than before or not. Thank you for your cooperation.


Regards,


-- 
Hitoshi Harada


Re: Window Functions patch v06

From
"David Rowley"
Date:
Hitoshi Harada wrote:
>I made up my mind to scratch former window functions and redesigned
>completely new execution model, based on the discussion with Heikki.
>Attached is the v06 against HEAD today.
>http://umitanuki.net/pgsql/wfv06/design.html

First off, fantastic work!

In my eyes this and WITH RECURSIVE are a big step for both Postgres and open
source RBDMS'.

Only, one small query with LEAD() and LAG()

Going by http://www.wiscorp.com/sql200n.zip

"The lead and lag functions each take three arguments, a <value expression>
VE, an <exact numeric literal>
OFFSET, and a <value expression> DEFAULT. For each row R within the window
partition P of R defined by
a window structure descriptor, the lag function returns the value of VE
evaluated on a row that is OFFSET
number of rows before R within P, and the lead function returns the value of
VE evaluated on a row that is
OFFSET number of rows after R within P. The value of DEFAULT is returned as
the result if there is no row
corresponding to the OFFSET number of rows before R within P (for the lag
function) or after R within P (for
the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be
specified to indicate whether
the rows within P for which VE evaluates to the null value are preserved or
eliminated"

So going by that:
SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee;

Would use 'None' for rows that would be out of the bounds of the window.
The current patch only seems to accept 2 arguments.
ERROR:  function lag(character varying, integer, unknown) does not exist




Re: Window Functions patch v06

From
"Hitoshi Harada"
Date:
2008/10/14 David Rowley <dgrowley@gmail.com>:
> Hitoshi Harada wrote:
>>I made up my mind to scratch former window functions and redesigned
>>completely new execution model, based on the discussion with Heikki.
>>Attached is the v06 against HEAD today.
>>http://umitanuki.net/pgsql/wfv06/design.html
>
> First off, fantastic work!
>
> In my eyes this and WITH RECURSIVE are a big step for both Postgres and open
> source RBDMS'.
>
> Only, one small query with LEAD() and LAG()
>
> Going by http://www.wiscorp.com/sql200n.zip
>
> "The lead and lag functions each take three arguments, a <value expression>
> VE, an <exact numeric literal>
> OFFSET, and a <value expression> DEFAULT. For each row R within the window
> partition P of R defined by
> a window structure descriptor, the lag function returns the value of VE
> evaluated on a row that is OFFSET
> number of rows before R within P, and the lead function returns the value of
> VE evaluated on a row that is
> OFFSET number of rows after R within P. The value of DEFAULT is returned as
> the result if there is no row
> corresponding to the OFFSET number of rows before R within P (for the lag
> function) or after R within P (for
> the lead function). In addition, RESPECT NULLS or IGNORE NULLS can be
> specified to indicate whether
> the rows within P for which VE evaluates to the null value are preserved or
> eliminated"
>
> So going by that:
> SELECT name,LAG(name,1,'None') OVER (ORDER BY employeeid) FROM employee;
>
> Would use 'None' for rows that would be out of the bounds of the window.
>
> The current patch only seems to accept 2 arguments.
> ERROR:  function lag(character varying, integer, unknown) does not exist
>
>
>

Thanks for your feedback.

I agree I need to work on that. Also from the spec, "RESPECT NULLS /
IGNORE NULLS" may be specified but not supported yet. This syntax
specification is out of the postgres general function call so I wonder
if those functions are treated specially or not.

Regards,


-- 
Hitoshi Harada


Re: Window Functions patch v06

From
Tom Lane
Date:
"Hitoshi Harada" <umi.tanuki@gmail.com> writes:
> I agree I need to work on that. Also from the spec, "RESPECT NULLS /
> IGNORE NULLS" may be specified but not supported yet. This syntax
> specification is out of the postgres general function call so I wonder
> if those functions are treated specially or not.

Egad, the SQL committee has certainly been taken over by creeping
COBOL-itis when it comes to inventing random new syntax ...
        regards, tom lane