Thread: Documentation, window functions

Documentation, window functions

From
Dennis Björklund
Date:
In
http://www.postgresql.org/docs/9.0/static/tutorial-window.html

it say

"Although avg will produce the same result no matter what order it
processes the partition's rows in, this is not true of all window
functions. When needed, you can control that order using ORDER BY within
OVER."

While it's true that avg() produce the same result no matter what order. A
ORDER BY clause will affect what rows are included in the computation and
thus change the result (the default window frame is
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). So one can not in
general add an ORDER BY to the example in the tutorial and get the same
result as without an ORDER BY.

Maybe we can find some better wording of the above?

/Dennis




Re: Documentation, window functions

From
Robert Haas
Date:
On Wed, Sep 22, 2010 at 6:03 AM, Dennis Björklund <db@zigo.dhs.org> wrote:
> In
>
>  http://www.postgresql.org/docs/9.0/static/tutorial-window.html
>
> it say
>
> "Although avg will produce the same result no matter what order it
> processes the partition's rows in, this is not true of all window
> functions. When needed, you can control that order using ORDER BY within
> OVER."
>
> While it's true that avg() produce the same result no matter what order. A
> ORDER BY clause will affect what rows are included in the computation and
> thus change the result (the default window frame is
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). So one can not in
> general add an ORDER BY to the example in the tutorial and get the same
> result as without an ORDER BY.
>
> Maybe we can find some better wording of the above?

Yeah, that doesn't seem right.

rhaas=# create table foo (a integer);
CREATE TABLE
rhaas=# insert into foo values (1);
INSERT 0 1
rhaas=# insert into foo values (2);
INSERT 0 1
rhaas=# insert into foo values (3);
INSERT 0 1
rhaas=# select a, avg(a) over () from foo;a |        avg
---+--------------------1 | 2.00000000000000002 | 2.00000000000000003 | 2.0000000000000000
(3 rows)

rhaas=# select a, avg(a) over (order by a) from foo;a |          avg
---+------------------------1 | 1.000000000000000000002 |     1.50000000000000003 |     2.0000000000000000
(3 rows)

But I confess that I'm sort of murky on how ORDER affects the window
frame, or how to rephrase this more sensibly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Documentation, window functions

From
Hitoshi Harada
Date:
2010/9/22 Dennis Björklund <db@zigo.dhs.org>:
> In
>
>  http://www.postgresql.org/docs/9.0/static/tutorial-window.html
>
> it say
>
> "Although avg will produce the same result no matter what order it
> processes the partition's rows in, this is not true of all window
> functions. When needed, you can control that order using ORDER BY within
> OVER."
>
> While it's true that avg() produce the same result no matter what order. A
> ORDER BY clause will affect what rows are included in the computation and
> thus change the result (the default window frame is
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). So one can not in
> general add an ORDER BY to the example in the tutorial and get the same
> result as without an ORDER BY.
>
> Maybe we can find some better wording of the above?

Your point is true, but I believe it's still ok because the section is
a tutorial for novices. If you start to explain everything here,
readers don't want to read it to the end.

Regards,


--
Hitoshi Harada


Re: Documentation, window functions

From
Alvaro Herrera
Date:
Excerpts from Hitoshi Harada's message of mié sep 22 12:54:45 -0400 2010:

> > Maybe we can find some better wording of the above?
> 
> Your point is true, but I believe it's still ok because the section is
> a tutorial for novices. If you start to explain everything here,
> readers don't want to read it to the end.

We had this exact question in the spanish list two weeks ago.  I also
suggest that we need to explain this a bit more explicitely.

Actually the spanish question involved a PARTITION BY / ORDER BY clause,
and the difference showed up not because of the existance of ORDER BY
alone, but by whether the ORDER BY had the same columns than PARTITION
BY or more.  Here it is:

http://archives.postgresql.org/message-id/OF7DB68F53.5DA80480-ON8625779F.0058B555-8625779F.005917FA@correogs.com.mx

(The archive is a bit confusing because the results appear as images at
the bottom of the email, instead of being interspersed with the text.)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Documentation, window functions

From
Dennis Björklund
Date:
> On Wed, Sep 22, 2010 at 6:03 AM, Dennis Björklund <db@zigo.dhs.org> wrote:
> But I confess that I'm sort of murky on how ORDER affects the window
> frame, or how to rephrase this more sensibly.

The rows included in the calculation of the window function are per default

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

where CURRENT ROW include all the rows that are equal to the row you are
at according to the ordering. So if you say order by name then all the
rows up to your name and all rows with the same name are included, not
later rows.

If you don't have any ordering, then all rows are "equal" and all rows are
included in the computation. That's why your example behaved like it did.

At least that's my understanding of how these things work. I've not used
window functions very much myself.

This is fairly difficult stuff and it probably don't belong in a tutorial
but the current wording suggest that you can add any ordering and it won't
affect the result. That is also a bad since it might teach people the
wrong thing.

/Dennis




Re: Documentation, window functions

From
Robert Haas
Date:
On Thu, Sep 23, 2010 at 11:34 PM, Dennis Björklund <db@zigo.dhs.org> wrote:
>> On Wed, Sep 22, 2010 at 6:03 AM, Dennis Björklund <db@zigo.dhs.org> wrote:
>> But I confess that I'm sort of murky on how ORDER affects the window
>> frame, or how to rephrase this more sensibly.
>
> The rows included in the calculation of the window function are per default
>
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>
> where CURRENT ROW include all the rows that are equal to the row you are
> at according to the ordering. So if you say order by name then all the
> rows up to your name and all rows with the same name are included, not
> later rows.
>
> If you don't have any ordering, then all rows are "equal" and all rows are
> included in the computation. That's why your example behaved like it did.
>
> At least that's my understanding of how these things work. I've not used
> window functions very much myself.
>
> This is fairly difficult stuff and it probably don't belong in a tutorial
> but the current wording suggest that you can add any ordering and it won't
> affect the result. That is also a bad since it might teach people the
> wrong thing.

Hmm... it is true that average will produce the same results on any
ordering of the same set of input values, though.  Perhaps the word
"partition" emcompass that, though then again maybe not.

I'd be happy to fix this if I understand what to fix it to.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Documentation, window functions

From
Dennis Björklund
Date:
> Hmm... it is true that average will produce the same results on any
> ordering of the same set of input values, though.  Perhaps the word
> "partition" emcompass that, though then again maybe not.
>
> I'd be happy to fix this if I understand what to fix it to.

I wish I knew how to rephrase it so it is both as simple as now and also
totally clear. We can't complicate it, it's a tutorial.

/Dennis



Re: Documentation, window functions

From
"Kevin Grittner"
Date:
Robert Haas  wrote:
> Hmm... it is true that average will produce the same results on any
> ordering of the same set of input values, though.
Not exactly.  For floating point approximations you get a more
accurate sum (and therefore a more accurate average) if you add the
values in ascending order of absolute value.  Depending on the data,
it can make quite a difference.
-Kevin



Re: Documentation, window functions

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Thu, Sep 23, 2010 at 11:34 PM, Dennis Bj?rklund <db@zigo.dhs.org> wrote:
> >> On Wed, Sep 22, 2010 at 6:03 AM, Dennis Bj?rklund <db@zigo.dhs.org> wrote:
> >> But I confess that I'm sort of murky on how ORDER affects the window
> >> frame, or how to rephrase this more sensibly.
> >
> > The rows included in the calculation of the window function are per default
> >
> > RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> >
> > where CURRENT ROW include all the rows that are equal to the row you are
> > at according to the ordering. So if you say order by name then all the
> > rows up to your name and all rows with the same name are included, not
> > later rows.
> >
> > If you don't have any ordering, then all rows are "equal" and all rows are
> > included in the computation. That's why your example behaved like it did.
> >
> > At least that's my understanding of how these things work. I've not used
> > window functions very much myself.
> >
> > This is fairly difficult stuff and it probably don't belong in a tutorial
> > but the current wording suggest that you can add any ordering and it won't
> > affect the result. That is also a bad since it might teach people the
> > wrong thing.
>
> Hmm... it is true that average will produce the same results on any
> ordering of the same set of input values, though.  Perhaps the word
> "partition" emcompass that, though then again maybe not.
>
> I'd be happy to fix this if I understand what to fix it to.

I clarified the window function ORDER BY wording to avoid mentioning
avg().  Applied patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 11859b4..218988e 100644
*** a/doc/src/sgml/advanced.sgml
--- b/doc/src/sgml/advanced.sgml
*************** SELECT depname, empno, salary, avg(salar
*** 383,392 ****
     </para>

     <para>
!     Although <function>avg</> will produce the same result no matter
!     what order it processes the partition's rows in, this is not true of all
!     window functions.  When needed, you can control that order using
!     <literal>ORDER BY</> within <literal>OVER</>.  Here is an example:

  <programlisting>
  SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;
--- 383,392 ----
     </para>

     <para>
!     You can also control the order in which rows are processed by
!     window functions using <literal>ORDER BY</> within <literal>OVER</>.
!     (The window <literal>ORDER BY</> does not even have to match the
!     order in which the rows are output.)  Here is an example:

  <programlisting>
  SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;