Thread: more support for various frame types of window functions

more support for various frame types of window functions

From

Hitoshi Harada

Date:

09 November 2009, 05:40:05

I'm not sure if it can be finished until the start of the next CF, but
I've been working on $subject. This work intends to extend current
limited frame types of window functions such like below;

- ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
- ORDER BY x RANGE BETWEEN CURRENT ROW AND f FOLLOWING
- ORDER BY x RANGE BETWEEN p PRECEDING AND f FOLLOWING

where "p" and "f" are values that indicate preceding/following frame
boundary offsets from current row (or peer). With this feature, you
can calculate something like "moving average" by SQL.

Frame types that won't be introduced in this work includes:

- EXCLUDE clause

The hardest point is that aggregates must be re-initialized as rows
exit from current frame, which doesn't occur in 8.4 design. One of the
solution for this is to let aggregates have "negative trans functions"
(NTF), and some comments in nodeWindowAgg.c say about it, which
current aggregate system doesn't have. But my work doesn't introduce
this mechanism because

1) "negative trans function" doesn't do anything in normal aggregate
2) forcing that to everyone who writes his/her own aggregate is quite
hard and incompatible to older releases
3) so, we must at least support aggregates that don't have NTF even if
it will be introduced in the future

That means moving average is initialized again on frame-off situation
as the frame moves down. I know that may kill it's performance but
reasons above result in my proposing design.

If you have better ideas please feel free to tell me, and any comments welcomed.

Regards,


-- 
Hitoshi Harada

Re: more support for various frame types of window functions

From

Heikki Linnakangas

Date:

09 November 2009, 06:09:22

Hitoshi Harada wrote:
> That means moving average is initialized again on frame-off situation
> as the frame moves down. I know that may kill it's performance but
> reasons above result in my proposing design.

Yeah, we need the reinitialization support to handle the generic case.

One idea is to take a copy of the state datum after each row. Then,
instead of initializing the aggregate from scratch, you can "roll back"
to an earlier copied state. It doesn't always help, but might be a part
of the solution.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

Re: more support for various frame types of window functions

From

David Fetter

Date:

09 November 2009, 06:32:40

On Mon, Nov 09, 2009 at 06:39:54PM +0900, Hitoshi Harada wrote:
> I'm not sure if it can be finished until the start of the next CF,
> but I've been working on $subject. This work intends to extend
> current limited frame types of window functions such like below;

This is very, very exciting.  Is there a public repository people can
check out?  In particular, I'm curious about how to handle ROWS vs.
RANGE, e.g.:
   avg(t) OVER (...       ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS smooth_five_points

vs. 
   avg(t) OVER (...       RANGE BETWEEN       INTERVAL '2 day' PRECEDING AND       INTERVAL '2 day' FOLLOWING) AS
five_day_average

> - ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> - ORDER BY x RANGE BETWEEN CURRENT ROW AND f FOLLOWING
> - ORDER BY x RANGE BETWEEN p PRECEDING AND f FOLLOWING
> 
> where "p" and "f" are values that indicate preceding/following frame
> boundary offsets from current row (or peer). With this feature, you
> can calculate something like "moving average" by SQL.
> 
> Frame types that won't be introduced in this work includes:
> 
> - EXCLUDE clause
> 
> The hardest point is that aggregates must be re-initialized as rows
> exit from current frame, which doesn't occur in 8.4 design. One of the
> solution for this is to let aggregates have "negative trans functions"
> (NTF), and some comments in nodeWindowAgg.c say about it, which
> current aggregate system doesn't have. But my work doesn't introduce
> this mechanism because
> 
> 1) "negative trans function" doesn't do anything in normal aggregate
> 2) forcing that to everyone who writes his/her own aggregate is quite
> hard and incompatible to older releases
> 3) so, we must at least support aggregates that don't have NTF even if
> it will be introduced in the future
> 
> That means moving average is initialized again on frame-off
> situation as the frame moves down. I know that may kill it's
> performance but reasons above result in my proposing design.
> 
> If you have better ideas please feel free to tell me, and any
> comments welcomed.

First, it's wonderful to hear you're working on this. :)

Second, it's tradition on the PostgreSQL project to start with a
slow(ish) and correct implementation, then make it faster.  NTFs or
other speed boosts, while nice to have, would not be needed for a
first production implementation.  After all, the alternatives without
the native ones are usually slow, buggy, unstable, or combinations of
all three.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: more support for various frame types of window functions

From

Andrew Gierth

Date:

09 November 2009, 09:26:43

>>>>> "Heikki" == Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Heikki> Yeah, we need the reinitialization support to handle the generic case.
Heikki> One idea is to take a copy of the state datum after each row.

Some existing aggregates, notably array_agg, rely on the assumption
that this never happens.

-- 
Andrew (irc:RhodiumToad)

Re: more support for various frame types of window functions

From

Hitoshi Harada

Date:

09 November 2009, 10:12:41

2009/11/9 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>>> "Heikki" == Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>
>  Heikki> Yeah, we need the reinitialization support to handle the generic case.
>
>  Heikki> One idea is to take a copy of the state datum after each row.
>
> Some existing aggregates, notably array_agg, rely on the assumption
> that this never happens.

I'm carefully coding to keep the existing behavior that when frame
start is UNBOUNDED PRECEDING aggregate state is cached. I don't see
how the copied state will help in other frame types, but is it rather
than that? We assume the frame starting and ending boundary never goes
back.

Regards,

--
Hitoshi Harada

Re: more support for various frame types of window functions

From

Hitoshi Harada

Date:

09 November 2009, 10:20:51

2009/11/9 David Fetter <david@fetter.org>:
> On Mon, Nov 09, 2009 at 06:39:54PM +0900, Hitoshi Harada wrote:
>> I'm not sure if it can be finished until the start of the next CF,
>> but I've been working on $subject. This work intends to extend
>> current limited frame types of window functions such like below;
>
> This is very, very exciting.  Is there a public repository people can
> check out?
Not so far as always. The step is quite small so I don't believe we
need developing repository but I'll create it when needed.

>  In particular, I'm curious about how to handle ROWS vs.
> RANGE, e.g.:
>
>    avg(t) OVER (...
>        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS smooth_five_points
>
> vs.
>
>    avg(t) OVER (...
>        RANGE BETWEEN
>        INTERVAL '2 day' PRECEDING AND
>        INTERVAL '2 day' FOLLOWING) AS five_day_average
>

I've not finished reading spec completely, but in the first frame
starts at exactly 2 rows before current row and ends at exactly 2 rows
after current row. The latter is a bit more complicated but it means
the frame starts at the beginning of peers whose value in ORDER BY
clause is current row value - 2 days and so on.


>> - ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>> - ORDER BY x RANGE BETWEEN CURRENT ROW AND f FOLLOWING
>> - ORDER BY x RANGE BETWEEN p PRECEDING AND f FOLLOWING
>>
>> where "p" and "f" are values that indicate preceding/following frame
>> boundary offsets from current row (or peer). With this feature, you
>> can calculate something like "moving average" by SQL.
>>
>> Frame types that won't be introduced in this work includes:
>>
>> - EXCLUDE clause
>>
>> The hardest point is that aggregates must be re-initialized as rows
>> exit from current frame, which doesn't occur in 8.4 design. One of the
>> solution for this is to let aggregates have "negative trans functions"
>> (NTF), and some comments in nodeWindowAgg.c say about it, which
>> current aggregate system doesn't have. But my work doesn't introduce
>> this mechanism because
>>
>> 1) "negative trans function" doesn't do anything in normal aggregate
>> 2) forcing that to everyone who writes his/her own aggregate is quite
>> hard and incompatible to older releases
>> 3) so, we must at least support aggregates that don't have NTF even if
>> it will be introduced in the future
>>
>> That means moving average is initialized again on frame-off
>> situation as the frame moves down. I know that may kill it's
>> performance but reasons above result in my proposing design.
>>
>> If you have better ideas please feel free to tell me, and any
>> comments welcomed.
>
> First, it's wonderful to hear you're working on this. :)
Thanks. I hope it will be done until 8.5 release.

>
> Second, it's tradition on the PostgreSQL project to start with a
> slow(ish) and correct implementation, then make it faster.  NTFs or
> other speed boosts, while nice to have, would not be needed for a
> first production implementation.  After all, the alternatives without
> the native ones are usually slow, buggy, unstable, or combinations of
> all three.
Yep, NTF is the next stage after we implement various frame types
correctly. I should be careful not to go wrong way where NTF cannot be
applied in the future.


Regards,

--
Hitoshi Harada

Re: more support for various frame types of window functions

From

Tom Lane

Date:

09 November 2009, 11:02:12

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> One idea is to take a copy of the state datum after each row. Then,
> instead of initializing the aggregate from scratch, you can "roll back"
> to an earlier copied state. It doesn't always help, but might be a part
> of the solution.

That requires that you know how to copy the aggregate's state.  You do
not.  (In some cases the aggregate function has extra state besides the
nominal transition datum...)
        regards, tom lane

Re: more support for various frame types of window functions

From

David Fetter

Date:

09 November 2009, 11:05:15

On Mon, Nov 09, 2009 at 11:20:39PM +0900, Hitoshi Harada wrote:
> 2009/11/9 David Fetter <david@fetter.org>:
> > On Mon, Nov 09, 2009 at 06:39:54PM +0900, Hitoshi Harada wrote:
> >> I'm not sure if it can be finished until the start of the next CF,
> >> but I've been working on $subject. This work intends to extend
> >> current limited frame types of window functions such like below;
> >
> > This is very, very exciting.  Is there a public repository people can
> > check out?
> Not so far as always. The step is quite small so I don't believe we
> need developing repository but I'll create it when needed.

Thanks :)

> >  In particular, I'm curious about how to handle ROWS vs.
> > RANGE, e.g.:
> >
> >    avg(t) OVER (...
> >        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS smooth_five_points
> >
> > vs.
> >
> >    avg(t) OVER (...
> >        RANGE BETWEEN
> >        INTERVAL '2 day' PRECEDING AND
> >        INTERVAL '2 day' FOLLOWING) AS five_day_average
> 
> I've not finished reading spec completely, but in the first frame
> starts at exactly 2 rows before current row and ends at exactly 2
> rows after current row. The latter is a bit more complicated but it
> means the frame starts at the beginning of peers whose value in
> ORDER BY clause is current row value - 2 days and so on.

That's pretty much it.  The spec may have some other things to say
about corner cases, NULLs, etc.

> > First, it's wonderful to hear you're working on this. :)
> Thanks. I hope it will be done until 8.5 release.

Will you be at the JPUG 10th anniversary?

Might code be there in time for that?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: more support for various frame types of window functions

From

Hitoshi Harada

Date:

09 November 2009, 11:25:24

2009/11/10 David Fetter <david@fetter.org>:
> On Mon, Nov 09, 2009 at 11:20:39PM +0900, Hitoshi Harada wrote:
>> 2009/11/9 David Fetter <david@fetter.org>:
>> > On Mon, Nov 09, 2009 at 06:39:54PM +0900, Hitoshi Harada wrote:
>> >> I'm not sure if it can be finished until the start of the next CF,
>> >> but I've been working on $subject. This work intends to extend
>> >> current limited frame types of window functions such like below;
>> >
>> > This is very, very exciting.  Is there a public repository people can
>> > check out?
>> Not so far as always. The step is quite small so I don't believe we
>> need developing repository but I'll create it when needed.
>
> Thanks :)
>
>> >  In particular, I'm curious about how to handle ROWS vs.
>> > RANGE, e.g.:
>> >
>> >    avg(t) OVER (...
>> >        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS smooth_five_points
>> >
>> > vs.
>> >
>> >    avg(t) OVER (...
>> >        RANGE BETWEEN
>> >        INTERVAL '2 day' PRECEDING AND
>> >        INTERVAL '2 day' FOLLOWING) AS five_day_average
>>
>> I've not finished reading spec completely, but in the first frame
>> starts at exactly 2 rows before current row and ends at exactly 2
>> rows after current row. The latter is a bit more complicated but it
>> means the frame starts at the beginning of peers whose value in
>> ORDER BY clause is current row value - 2 days and so on.
>
> That's pretty much it.  The spec may have some other things to say
> about corner cases, NULLs, etc.
OK, I'll check it.

>> > First, it's wonderful to hear you're working on this. :)
>> Thanks. I hope it will be done until 8.5 release.
>
> Will you be at the JPUG 10th anniversary?
Sure, I'll be there for both two days and give lightning talk on
Saturday evening (but in Japanese, sorry!)

>
> Might code be there in time for that?
I hope so...



--
Hitoshi Harada

Re: more support for various frame types of window functions

From

Greg Stark

Date:

09 November 2009, 12:11:06

On Mon, Nov 9, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> One idea is to take a copy of the state datum after each row. Then,
>> instead of initializing the aggregate from scratch, you can "roll back"
>> to an earlier copied state. It doesn't always help, but might be a part
>> of the solution.
>
> That requires that you know how to copy the aggregate's state.  You do
> not.  (In some cases the aggregate function has extra state besides the
> nominal transition datum...)

That's relatively unusual and usually a bad idea, imho. We could add a
flag to indicate whether that's the case and only do the optimization
if it's not set. It would rarely be set.

We already faced this problem with aggregates over windows with range
unbounded preceding to the current row. I lost track of the current
way of handling the state for those when they're repeatedly finalized.

Actually I had assumed the approach would be different. Instead of
reinitializing the aggregate I had pictured keeping a rolling window
of aggregate states. Ie, if you're doing a rolling average of the last
five values you keep five aggregate states which started 1 row ago, 2
rows ago, ... up to 5 rows ago. You finalize the last one and roll the
others down using the state transition function.

My instinct was that that would be more efficient but working through
it in detail it looks to me like it would do precisely the same work
and have difficulties if the window doesn't fit in work_mem. So
perhaps it's a bad idea, but I wanted to mention it anyways.

--
greg

Re: more support for various frame types of window functions

From

Sam Mason

Date:

09 November 2009, 12:47:26

On Mon, Nov 09, 2009 at 10:01:53AM -0500, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > One idea is to take a copy of the state datum after each row. Then,
> > instead of initializing the aggregate from scratch, you can "roll back"
> > to an earlier copied state. It doesn't always help, but might be a part
> > of the solution.
> 
> That requires that you know how to copy the aggregate's state.  You do
> not.  (In some cases the aggregate function has extra state besides the
> nominal transition datum...)

Other ways of doing aggregates/folds involve exploiting a "tree" like
structure (mostly for reasons of parallelism, but these don't apply to
PG).  For example, instead of having the triple (init,accum,final) as we
do at the moment, you could have a "merge" function that would take two
intermediate states and combine them.  For example, COUNT and SUM would
be the add function, MIN and MAX would be least and greatest functions
respectively, and so on.

If this method exists then you can do the fancy stuff as suggested, if
it doesn't exist then fall back to less optimal methods.

--  Sam  http://samason.me.uk/

Re: more support for various frame types of window functions

From

Tom Lane

Date:

09 November 2009, 14:06:02

Greg Stark <gsstark@mit.edu> writes:
> On Mon, Nov 9, 2009 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That requires that you know how to copy the aggregate's state. �You do
>> not. �(In some cases the aggregate function has extra state besides the
>> nominal transition datum...)

> That's relatively unusual and usually a bad idea, imho. We could add a
> flag to indicate whether that's the case and only do the optimization
> if it's not set. It would rarely be set.

We already burned that bridge, unfortunately --- core and contrib both
contain examples of aggregates that do things this way, and I'll bet
there are third-party examples by now.  If you have a large working
state for an aggregate, it's just too tempting to prevent nodeAgg.c
from copying the darn thing all the time.

It's fairly clear that efficient support for this will require extra
help from the aggregate author.  Ideally we would define it so that
the author could implement it either with negative transitions or with
multiple copies of the agg state, whichever seems most efficient.
The cases that are hard are exactly where the agg state is large,
so I doubt it's a good idea to legislate that multiple copies is how
to do it.

In any case, I don't think we can put retroactive restrictions on what
the aggregates do in the normal case, and we can definitely not assume
that existing aggregates will politely tell us what they are doing.
        regards, tom lane