Thread: Some semantic details of the window-function spec

Some semantic details of the window-function spec

From

Tom Lane

Date:

22 December 2008, 15:52:11

After a couple of hours reading the SQL:2008 spec, I've come to some
conclusions about the semantics that are demanded for window functions.
Anyone want to to check my work?

* If window functions are used together with aggregates or grouping,
the grouping and regular aggregation happens first, and then windowing
is done on the output rows (group rows).  See section 7.12 rules 11,12.

* Therefore, the argument of a window function can contain aggregate
functions, and cannot reference ungrouped columns (both unlike regular
aggregates).  Conversely, an ordinary aggregate's argument can't
contain a window function.

* Also, window specifications can contain aggregate functions, and
cannot reference ungrouped columns; this is because they represent
another layer of grouping/ordering on top of the GROUP BY if any.
(An explicit ORDER BY, if any, happens last of all.)

* It is not very clear whether a window function's argument can contain
another window function.  7.11 forbids window specifications from
containing window functions, which is sensible, but I can't find any
such statement about the arguments.  The present patch forbids nesting
window functions, and I'm fine with imposing that as an implementation
restriction even if it's not in the spec; but can anyone find it in the
spec?

* Unlike aggregates, there doesn't seem to be any concept of a window
function being attached to an outer-level query --- in fact 6.10 rule
4 says that a window function's argument can't contain outer references
at all.  That seems excessively strong, but it does seem that there is
no point in a "winlevelsup" field in WindowFunc, nor in implementing
any infrastructure for outer-level window functions.

* The last part of section 4.14 states that two different window
functions must be evaluated against the same sorted row ordering if
they have syntactically identical partition and ordering clauses,
even if the windows are otherwise distinct (in particular there
could be different framing clauses).  Since we don't currently implement
framing clauses the latter is not too interesting, but it's still true
that the patch as I currently have it doesn't fully meet that
requirement: if you intentionally specify separate but equivalent named
window definitions, it won't be smart enough to fold them together,
and you could end up with extra sorts happening and possibly a different
ordering of sort-peer rows.  How worried are we about that?
        regards, tom lane

Re: Some semantic details of the window-function spec

From

"Hitoshi Harada"

Date:

22 December 2008, 18:39:13

2008/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
> * Unlike aggregates, there doesn't seem to be any concept of a window
> function being attached to an outer-level query --- in fact 6.10 rule
> 4 says that a window function's argument can't contain outer references
> at all.  That seems excessively strong, but it does seem that there is
> no point in a "winlevelsup" field in WindowFunc, nor in implementing
> any infrastructure for outer-level window functions.

I am so ignorant that I don't know what exactly agglevelsup
represents. Just copied it from agg to window functions... Could
someone show me easy example?

>
> * The last part of section 4.14 states that two different window
> functions must be evaluated against the same sorted row ordering if
> they have syntactically identical partition and ordering clauses,
> even if the windows are otherwise distinct (in particular there
> could be different framing clauses).  Since we don't currently implement
> framing clauses the latter is not too interesting, but it's still true
> that the patch as I currently have it doesn't fully meet that
> requirement: if you intentionally specify separate but equivalent named
> window definitions, it won't be smart enough to fold them together,
> and you could end up with extra sorts happening and possibly a different
> ordering of sort-peer rows.  How worried are we about that?

Is it? I intended all equivalent windows are folded into one as far as
equal(w1->partitionClause, w2->partitionClause) &&
equal(w1->orderClause, w2->orderClause) is true. And I believe
everytime it's true, we can process them in the same window ordering.
One thing, in the former discussion someone pointed me that we might
have to care about that volatile functions are contained in the
window. I currently don't have any idea about that.


Regards,


-- 
Hitoshi Harada

Re: Some semantic details of the window-function spec

From

Tom Lane

Date:

23 December 2008, 04:33:12

"Hitoshi Harada" <umi.tanuki@gmail.com> writes:
> 2008/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
>> * Unlike aggregates, there doesn't seem to be any concept of a window
>> function being attached to an outer-level query --- in fact 6.10 rule
>> 4 says that a window function's argument can't contain outer references
>> at all.  That seems excessively strong, but it does seem that there is
>> no point in a "winlevelsup" field in WindowFunc, nor in implementing
>> any infrastructure for outer-level window functions.

> I am so ignorant that I don't know what exactly agglevelsup
> represents. Just copied it from agg to window functions... Could
> someone show me easy example?

In something like
select ..., (select avg(a.x) from b), ... from a;

the avg() invocation is actually an aggregate of the outer query over a.
It's a constant so far as any one invocation of the sub-select on b
is concerned.  The SQL:2008 spec is pretty opaque about this (as it
is on a whole lot of matters :-() but if you read older versions like
SQL92 it's spelled out a bit more clearly.  I don't see any wording
suggesting that window functions are supposed to work this way, however.

>> * The last part of section 4.14 states that two different window
>> functions must be evaluated against the same sorted row ordering if
>> they have syntactically identical partition and ordering clauses,
>> even if the windows are otherwise distinct (in particular there
>> could be different framing clauses).  Since we don't currently implement
>> framing clauses the latter is not too interesting, but it's still true
>> that the patch as I currently have it doesn't fully meet that
>> requirement: if you intentionally specify separate but equivalent named
>> window definitions, it won't be smart enough to fold them together,
>> and you could end up with extra sorts happening and possibly a different
>> ordering of sort-peer rows.  How worried are we about that?

> Is it? I intended all equivalent windows are folded into one as far as
> equal(w1->partitionClause, w2->partitionClause) &&
> equal(w1->orderClause, w2->orderClause) is true.

Well, I rewrote that code pretty heavily because it didn't work per spec
as far as references to existing windows went.  The problem that remains
is that in something like
WINDOW  w1 as (partition by x),    w2 as (partition by x),    w3 as (w1 order by y),    w4 as (w2 order by y)

w3 and w4 are equivalent but it's pretty hard to recognize that.  And
even if we did recognize it, we couldn't simply fold them together into
a single windowClause entry without changing the way that the query
looks on reverse-listing.  (The patch as submitted doesn't even *have*
reverse-listing capability for WINDOW clauses, but I plan to fix that
tomorrow ...)

This is doubtless fixable with another level of indirection, I'm just
wondering how much trouble it's worth.  It seems like the only case
that will really arise in practice is duplicate anonymous windows
(identical OVER clauses), and the code does fold those together.
        regards, tom lane

Re: Some semantic details of the window-function spec

From

"Hitoshi Harada"

Date:

23 December 2008, 10:05:23

2008/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
> "Hitoshi Harada" <umi.tanuki@gmail.com> writes:
>> 2008/12/23 Tom Lane <tgl@sss.pgh.pa.us>:
>>> * Unlike aggregates, there doesn't seem to be any concept of a window
>>> function being attached to an outer-level query --- in fact 6.10 rule
>>> 4 says that a window function's argument can't contain outer references
>>> at all.  That seems excessively strong, but it does seem that there is
>>> no point in a "winlevelsup" field in WindowFunc, nor in implementing
>>> any infrastructure for outer-level window functions.
>
>> I am so ignorant that I don't know what exactly agglevelsup
>> represents. Just copied it from agg to window functions... Could
>> someone show me easy example?
>
> In something like
>
>        select ..., (select avg(a.x) from b), ... from a;
>
> the avg() invocation is actually an aggregate of the outer query over a.
> It's a constant so far as any one invocation of the sub-select on b
> is concerned.  The SQL:2008 spec is pretty opaque about this (as it
> is on a whole lot of matters :-() but if you read older versions like
> SQL92 it's spelled out a bit more clearly.  I don't see any wording
> suggesting that window functions are supposed to work this way, however.

Still I don't see any use case that is useful with the agglevelsup.
However, contrast to aggregate that can reduce rows into one, window
functions are not expected that you receive only one row, though it is
possible. If this is not documented in the spec, the final decision is
per the community, but in my humble opinion it is not worth.

>>> * The last part of section 4.14 states that two different window
>>> functions must be evaluated against the same sorted row ordering if
>>> they have syntactically identical partition and ordering clauses,
>>> even if the windows are otherwise distinct (in particular there
>>> could be different framing clauses).  Since we don't currently implement
>>> framing clauses the latter is not too interesting, but it's still true
>>> that the patch as I currently have it doesn't fully meet that
>>> requirement: if you intentionally specify separate but equivalent named
>>> window definitions, it won't be smart enough to fold them together,
>>> and you could end up with extra sorts happening and possibly a different
>>> ordering of sort-peer rows.  How worried are we about that?
>
>> Is it? I intended all equivalent windows are folded into one as far as
>> equal(w1->partitionClause, w2->partitionClause) &&
>> equal(w1->orderClause, w2->orderClause) is true.
>
> Well, I rewrote that code pretty heavily because it didn't work per spec
> as far as references to existing windows went.  The problem that remains
> is that in something like
>
>        WINDOW  w1 as (partition by x),
>                w2 as (partition by x),
>                w3 as (w1 order by y),
>                w4 as (w2 order by y)
>
> w3 and w4 are equivalent but it's pretty hard to recognize that.  And
> even if we did recognize it, we couldn't simply fold them together into
> a single windowClause entry without changing the way that the query
> looks on reverse-listing.  (The patch as submitted doesn't even *have*
> reverse-listing capability for WINDOW clauses, but I plan to fix that
> tomorrow ...)
>
> This is doubtless fixable with another level of indirection, I'm just
> wondering how much trouble it's worth.  It seems like the only case
> that will really arise in practice is duplicate anonymous windows
> (identical OVER clauses), and the code does fold those together.

Yes, fixable. But I feel that it's not worth as well. To me the spec
has been saying the order of window function invocations are not
deterministic and they may be separated or not depending on the
implementation, though of course it is not what the spec exactly says.
But it doesn't say "must" or "should" either. Simply, "Two <window
function>s are computed using the same (possibly non-deterministic)
window ordering". In the talk about executor design, we have sometimes
discussed about separating invocations even they are in the identical
window. So I suppose we'd better leave some chances to separate
invocations for some future day.

And the order of each invocations is relevant with nested window
functions talk you gave before. Since window functions, even though
they are contained in the identical window, may be invoked in
different window or different order, nesting should not be supported
obviously.

Also I have documented about order of window function invocation
somewhere in sgml doc, saying "it is not predictable".

If I am asked whether it is said per spec, I cannot find clearly in
the spec. But I feel reasonable.

Regards,

-- 
Hitoshi Harada