Re: Window functions patch v04 for the September commit fest - Mailing list pgsql-hackers

From David Fetter
Subject Re: Window functions patch v04 for the September commit fest
Date
Msg-id 20080901203903.GP3717@fetter.org
Whole thread Raw
In response to Re: Window functions patch v04 for the September commit fest  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Window functions patch v04 for the September commit fest  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Mon, Sep 01, 2008 at 09:00:47PM +0300, Heikki Linnakangas wrote:
> Hitoshi Harada wrote:
>> 2008/8/30 Hitoshi Harada <umi.tanuki@gmail.com>:
>>> Here's the latest window functions patch against HEAD. It seems to be
>>> ready for the September commit fest, as added documents, WINDOW clause
>>> feature and misc tests. I guess this would be the window functions
>>> feature freeze for 8.4. The remaining feature will be implemented for
>>> the later release.
>>>
>>> This patch consists of:
>>> - windowed aggregates
>>> - cooperate with GROUP BY aggregates
>>> - some optimizations with multiple windows
>>> - ranking functions
>>> - WINDOW clause
>>> - windowing SQL regression tests
>>> - sgml documents
>>>
>>> Looking up the total road map, the dropped features are:
>>>
>>> - sliding window (window framing)
>>> - lead(), lag(), etc. that reach for random rows
>>> - user defined window functions
>
> Ok, I'm starting to read up on SQL2003 window functions,

Maybe it would be better to read the SQL2008 standard
<http://wiscorp.com/sql200n.zip> :)

> and to review  this patch.  Here's a long brain-dump of my first
> thoughts on the design.
>
> Let's list a couple of requirements first:
>
> 1. It's important that what gets committed now can be extended to
> handle  all of the window function stuff in SQL2003 in the future,
> as well as  user-defined-window-functions in the spirit of
> PostgreSQL extensibility.  Even if we don't implement all of it in
> this release.

> 2. Performance needs to be reasonable, in the face a large number of  
> input rows. These are typically used in OLAP queries that process  
> gazillions of rows. Some window functions need access to all rows in the  
> window frame or partition, so you can't reasonably expect great  
> performance with those if the working set is large, but those that  
> don't, should work with finite memory requirements, and reasonably fast.
>
> Because of 1., let's focus on the interface for
> user-defined-window-functions first. Ideally, the interface is such
> that  all the window functions and aggregates defined in SQL2003,
> and others  we might want to have in core or as pgfoundry projects,
> can be  implemented using it.

SQL2008 defines the following:

<window function> ::=   <window function type> OVER <window name or specification>

<window function type> ::=     <rank function type> <left paren> <right paren>   | ROW_NUMBER <left paren> <right
paren>  | <aggregate function>   | <ntile function>   | <lead or lag function>   | <first or last value function>   |
<nthvalue function>
 

<rank function type> ::=     RANK   | DENSE_RANK   | PERCENT_RANK   | CUME_DIST

<ntile function> ::=   NTILE <left paren> <number of tiles> <right paren>

<number of tiles> ::=     <simple value specification>   | <dynamic parameter specification>

<lead or lag function> ::=   <lead or lag> <left paren> <lead or lag extent>       [ <comma> <offset> [ <comma>
<defaultexpression> ] ] <right paren>       [ <null treatment> ]
 

<lead or lag> ::=   LEAD | LAG

<lead or lag extent> ::=   <value expression>

<offset> ::=   <exact numeric literal>

<default expression> ::=   <value expression>

<null treatment> ::=   RESPECT NULLS | IGNORE NULLS

<first or last value function> ::=   <first or last value> <left paren> <value expression> <right paren> [ <null
treatment>]
 

<first or last value> ::=   FIRST_VALUE | LAST_VALUE

<nth value function> ::=   NTH_VALUE <left paren> <value expression> <comma> <nth row> <right paren>       [ <from
firstor last> ] [ <null treatment> ]
 

<nth row> ::=     <simple value specification>   | <dynamic parameter specification>

<from first or last> ::=     FROM FIRST   | FROM LAST

<window name or specification> ::=     <window name>   | <in-line window specification>

<in-line window specification> ::=   <window specification>

> For functions like LEAD or CUME_DIST that don't immediately start  
> returning values, the executor will need a tuplestore to buffer input  
> rows until it gets their values from the window function.

For these aggregates, should there be an API that signals that a tuplestore
will be needed?

> The syntax for creating a ranking aggregate might look something like this:
>
> CREATE RANKING AGGREGATE name (
>     SFUNC = sfunc,
>     STYPE = state_data_type,
>     OUTFUNC = outfunc
> )
>
> This is very similar to Simon's proposal, and to current CREATE
> AGGREGATE, but tweaked a bit to fix the problems Hitoshi pointed
> out.
>
> sfunc is called once for each input row. Unlike with normal
> aggregates,  sfunc is passed the whole input row, so that e.g RANK
> can compare it  against the previous row, or LEAD can buffer it.
>
> outfunc is a set-returning function, and it is called until it
> returns  no more rows, after each sfunc invocation.

> Window aggregates
> -----------------
>
> Like normal aggregates, window aggregates process N input values, and  
> return one output value. In normal GROUP BY expressions, the aggregate  
> function is called once each group, but in a window aggregate expression  
> with a window frame (aka sliding window), there is as many "groups" as  
> there is rows.
>
> In fact, any normal aggregate function can also be used as a window  
> aggregate and vice versa. The trivial implementation is to have a buffer  
> to hold all values currently in the window frame, and for each input  
> row, invoke the aggregate function over all the rows currently in the  
> frame. I think we'll need support that, to allow using any built-in or  
> user-defined aggregate as a window aggregate.
>
> However, we also want to provide optimized versions of the common  
> aggregates. Aggregates like COUNT, SUM, and AVG can easily be  
> implemented more efficiently, without rescanning all the tuples in the  
> group.
>
> I propose an extension to the current CREATE AGGREGATE syntax to allow  
> for optimized versions. Currently, the syntax is like:
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>     SFUNC = sfunc,
>     STYPE = state_data_type
>     [ , FINALFUNC = ffunc ]
>     [ , INITCOND = initial_condition ]
>     [ , SORTOP = sort_operator ]
> )
>
> Let's add a function to allow removing tuples from the current window frame:
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>     SFUNC = sfunc,
>     STYPE = state_data_type
>     [ , FINALFUNC = ffunc ]
>     [ , INITCOND = initial_condition ]
>     [ , SORTOP = sort_operator ]
>     [ , SUBTRACTFUNC = subfunc,
> )
>
> subfunc is like sfunc, except that the input value is "subtracted" from  
> the current value. On each input row, the executor calls sfunc for all  
> new tuples that enter the window frame, and subfunc for all tuples that  
> exit it. Another difference is that finalfunc can be called many times  
> during the lifecycle of an aggregate invocation. For example, the  
> subfunc for COUNT would decrement the row count by one, and for SUM,  
> subfunc would subtract the removed value from the current sum. Perhaps  
> we should also support an "opportunistic subfunc", that could either  
> perform the subtraction, or return a special value indicating that it  
> can't be done, and we need to calculate the aggregate from scratch as if  
> subfunc wasn't defined. That would be useful for MIN/MAX, which can be  
> implemented by just keeping track of the current MIN/MAX value, and  
> "subtracting" any other value than the current MIN/MAX is a no-op, but  
> "subtracting" the current MIN/MAX value requires scanning all the rest  
> of the tuples from scratch.
>
> PS. Have you looked at the "hypothetical set functions" in SQL2003?

They're kinda neat :)

> PPS. I was just about to send this, when I noticed that LEAD/LAG are in  
> fact not in the SQL2003 draft I have at hand. In fact, none of the  
> window functions defined there, RANK, DENSE_RANK, PERCENT_RANK,  
> CUME_DIST and ROW_NUMBER, require random access to the tuples. Still
> seems like a good idea to be have the flexibility for them.

They're in SQL:2008.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [Patch Review] TRUNCATE Permission
Next
From: Tom Lane
Date:
Subject: Re: Is this really really as designed or defined in some standard