Re: more support for various frame types of window functions - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: more support for various frame types of window functions
Date
Msg-id e08cc0400911090620u296591by214b56b377c51eb0@mail.gmail.com
Whole thread Raw
In response to Re: more support for various frame types of window functions  (David Fetter <david@fetter.org>)
Responses Re: more support for various frame types of window functions
List pgsql-hackers
2009/11/9 David Fetter <david@fetter.org>:
> On Mon, Nov 09, 2009 at 06:39:54PM +0900, Hitoshi Harada wrote:
>> I'm not sure if it can be finished until the start of the next CF,
>> but I've been working on $subject. This work intends to extend
>> current limited frame types of window functions such like below;
>
> This is very, very exciting.  Is there a public repository people can
> check out?
Not so far as always. The step is quite small so I don't believe we
need developing repository but I'll create it when needed.

>  In particular, I'm curious about how to handle ROWS vs.
> RANGE, e.g.:
>
>    avg(t) OVER (...
>        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS smooth_five_points
>
> vs.
>
>    avg(t) OVER (...
>        RANGE BETWEEN
>        INTERVAL '2 day' PRECEDING AND
>        INTERVAL '2 day' FOLLOWING) AS five_day_average
>

I've not finished reading spec completely, but in the first frame
starts at exactly 2 rows before current row and ends at exactly 2 rows
after current row. The latter is a bit more complicated but it means
the frame starts at the beginning of peers whose value in ORDER BY
clause is current row value - 2 days and so on.


>> - ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>> - ORDER BY x RANGE BETWEEN CURRENT ROW AND f FOLLOWING
>> - ORDER BY x RANGE BETWEEN p PRECEDING AND f FOLLOWING
>>
>> where "p" and "f" are values that indicate preceding/following frame
>> boundary offsets from current row (or peer). With this feature, you
>> can calculate something like "moving average" by SQL.
>>
>> Frame types that won't be introduced in this work includes:
>>
>> - EXCLUDE clause
>>
>> The hardest point is that aggregates must be re-initialized as rows
>> exit from current frame, which doesn't occur in 8.4 design. One of the
>> solution for this is to let aggregates have "negative trans functions"
>> (NTF), and some comments in nodeWindowAgg.c say about it, which
>> current aggregate system doesn't have. But my work doesn't introduce
>> this mechanism because
>>
>> 1) "negative trans function" doesn't do anything in normal aggregate
>> 2) forcing that to everyone who writes his/her own aggregate is quite
>> hard and incompatible to older releases
>> 3) so, we must at least support aggregates that don't have NTF even if
>> it will be introduced in the future
>>
>> That means moving average is initialized again on frame-off
>> situation as the frame moves down. I know that may kill it's
>> performance but reasons above result in my proposing design.
>>
>> If you have better ideas please feel free to tell me, and any
>> comments welcomed.
>
> First, it's wonderful to hear you're working on this. :)
Thanks. I hope it will be done until 8.5 release.

>
> Second, it's tradition on the PostgreSQL project to start with a
> slow(ish) and correct implementation, then make it faster.  NTFs or
> other speed boosts, while nice to have, would not be needed for a
> first production implementation.  After all, the alternatives without
> the native ones are usually slow, buggy, unstable, or combinations of
> all three.
Yep, NTF is the next stage after we implement various frame types
correctly. I should be careful not to go wrong way where NTF cannot be
applied in the future.


Regards,

--
Hitoshi Harada


pgsql-hackers by date:

Previous
From: Hitoshi Harada
Date:
Subject: Re: more support for various frame types of window functions
Next
From: Tom Lane
Date:
Subject: Re: more support for various frame types of window functions