Thread: Some semantic details of the window-function spec
After a couple of hours reading the SQL:2008 spec, I've come to some conclusions about the semantics that are demanded for window functions. Anyone want to to check my work? * If window functions are used together with aggregates or grouping, the grouping and regular aggregation happens first, and then windowing is done on the output rows (group rows). See section 7.12 rules 11,12. * Therefore, the argument of a window function can contain aggregate functions, and cannot reference ungrouped columns (both unlike regular aggregates). Conversely, an ordinary aggregate's argument can't contain a window function. * Also, window specifications can contain aggregate functions, and cannot reference ungrouped columns; this is because they represent another layer of grouping/ordering on top of the GROUP BY if any. (An explicit ORDER BY, if any, happens last of all.) * It is not very clear whether a window function's argument can contain another window function. 7.11 forbids window specifications from containing window functions, which is sensible, but I can't find any such statement about the arguments. The present patch forbids nesting window functions, and I'm fine with imposing that as an implementation restriction even if it's not in the spec; but can anyone find it in the spec? * Unlike aggregates, there doesn't seem to be any concept of a window function being attached to an outer-level query --- in fact 6.10 rule 4 says that a window function's argument can't contain outer references at all. That seems excessively strong, but it does seem that there is no point in a "winlevelsup" field in WindowFunc, nor in implementing any infrastructure for outer-level window functions. * The last part of section 4.14 states that two different window functions must be evaluated against the same sorted row ordering if they have syntactically identical partition and ordering clauses, even if the windows are otherwise distinct (in particular there could be different framing clauses). Since we don't currently implement framing clauses the latter is not too interesting, but it's still true that the patch as I currently have it doesn't fully meet that requirement: if you intentionally specify separate but equivalent named window definitions, it won't be smart enough to fold them together, and you could end up with extra sorts happening and possibly a different ordering of sort-peer rows. How worried are we about that? regards, tom lane
2008/12/23 Tom Lane <tgl@sss.pgh.pa.us>: > * Unlike aggregates, there doesn't seem to be any concept of a window > function being attached to an outer-level query --- in fact 6.10 rule > 4 says that a window function's argument can't contain outer references > at all. That seems excessively strong, but it does seem that there is > no point in a "winlevelsup" field in WindowFunc, nor in implementing > any infrastructure for outer-level window functions. I am so ignorant that I don't know what exactly agglevelsup represents. Just copied it from agg to window functions... Could someone show me easy example? > > * The last part of section 4.14 states that two different window > functions must be evaluated against the same sorted row ordering if > they have syntactically identical partition and ordering clauses, > even if the windows are otherwise distinct (in particular there > could be different framing clauses). Since we don't currently implement > framing clauses the latter is not too interesting, but it's still true > that the patch as I currently have it doesn't fully meet that > requirement: if you intentionally specify separate but equivalent named > window definitions, it won't be smart enough to fold them together, > and you could end up with extra sorts happening and possibly a different > ordering of sort-peer rows. How worried are we about that? Is it? I intended all equivalent windows are folded into one as far as equal(w1->partitionClause, w2->partitionClause) && equal(w1->orderClause, w2->orderClause) is true. And I believe everytime it's true, we can process them in the same window ordering. One thing, in the former discussion someone pointed me that we might have to care about that volatile functions are contained in the window. I currently don't have any idea about that. Regards, -- Hitoshi Harada
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: > 2008/12/23 Tom Lane <tgl@sss.pgh.pa.us>: >> * Unlike aggregates, there doesn't seem to be any concept of a window >> function being attached to an outer-level query --- in fact 6.10 rule >> 4 says that a window function's argument can't contain outer references >> at all. That seems excessively strong, but it does seem that there is >> no point in a "winlevelsup" field in WindowFunc, nor in implementing >> any infrastructure for outer-level window functions. > I am so ignorant that I don't know what exactly agglevelsup > represents. Just copied it from agg to window functions... Could > someone show me easy example? In something like select ..., (select avg(a.x) from b), ... from a; the avg() invocation is actually an aggregate of the outer query over a. It's a constant so far as any one invocation of the sub-select on b is concerned. The SQL:2008 spec is pretty opaque about this (as it is on a whole lot of matters :-() but if you read older versions like SQL92 it's spelled out a bit more clearly. I don't see any wording suggesting that window functions are supposed to work this way, however. >> * The last part of section 4.14 states that two different window >> functions must be evaluated against the same sorted row ordering if >> they have syntactically identical partition and ordering clauses, >> even if the windows are otherwise distinct (in particular there >> could be different framing clauses). Since we don't currently implement >> framing clauses the latter is not too interesting, but it's still true >> that the patch as I currently have it doesn't fully meet that >> requirement: if you intentionally specify separate but equivalent named >> window definitions, it won't be smart enough to fold them together, >> and you could end up with extra sorts happening and possibly a different >> ordering of sort-peer rows. How worried are we about that? > Is it? I intended all equivalent windows are folded into one as far as > equal(w1->partitionClause, w2->partitionClause) && > equal(w1->orderClause, w2->orderClause) is true. Well, I rewrote that code pretty heavily because it didn't work per spec as far as references to existing windows went. The problem that remains is that in something like WINDOW w1 as (partition by x), w2 as (partition by x), w3 as (w1 order by y), w4 as (w2 order by y) w3 and w4 are equivalent but it's pretty hard to recognize that. And even if we did recognize it, we couldn't simply fold them together into a single windowClause entry without changing the way that the query looks on reverse-listing. (The patch as submitted doesn't even *have* reverse-listing capability for WINDOW clauses, but I plan to fix that tomorrow ...) This is doubtless fixable with another level of indirection, I'm just wondering how much trouble it's worth. It seems like the only case that will really arise in practice is duplicate anonymous windows (identical OVER clauses), and the code does fold those together. regards, tom lane
2008/12/23 Tom Lane <tgl@sss.pgh.pa.us>: > "Hitoshi Harada" <umi.tanuki@gmail.com> writes: >> 2008/12/23 Tom Lane <tgl@sss.pgh.pa.us>: >>> * Unlike aggregates, there doesn't seem to be any concept of a window >>> function being attached to an outer-level query --- in fact 6.10 rule >>> 4 says that a window function's argument can't contain outer references >>> at all. That seems excessively strong, but it does seem that there is >>> no point in a "winlevelsup" field in WindowFunc, nor in implementing >>> any infrastructure for outer-level window functions. > >> I am so ignorant that I don't know what exactly agglevelsup >> represents. Just copied it from agg to window functions... Could >> someone show me easy example? > > In something like > > select ..., (select avg(a.x) from b), ... from a; > > the avg() invocation is actually an aggregate of the outer query over a. > It's a constant so far as any one invocation of the sub-select on b > is concerned. The SQL:2008 spec is pretty opaque about this (as it > is on a whole lot of matters :-() but if you read older versions like > SQL92 it's spelled out a bit more clearly. I don't see any wording > suggesting that window functions are supposed to work this way, however. Still I don't see any use case that is useful with the agglevelsup. However, contrast to aggregate that can reduce rows into one, window functions are not expected that you receive only one row, though it is possible. If this is not documented in the spec, the final decision is per the community, but in my humble opinion it is not worth. >>> * The last part of section 4.14 states that two different window >>> functions must be evaluated against the same sorted row ordering if >>> they have syntactically identical partition and ordering clauses, >>> even if the windows are otherwise distinct (in particular there >>> could be different framing clauses). Since we don't currently implement >>> framing clauses the latter is not too interesting, but it's still true >>> that the patch as I currently have it doesn't fully meet that >>> requirement: if you intentionally specify separate but equivalent named >>> window definitions, it won't be smart enough to fold them together, >>> and you could end up with extra sorts happening and possibly a different >>> ordering of sort-peer rows. How worried are we about that? > >> Is it? I intended all equivalent windows are folded into one as far as >> equal(w1->partitionClause, w2->partitionClause) && >> equal(w1->orderClause, w2->orderClause) is true. > > Well, I rewrote that code pretty heavily because it didn't work per spec > as far as references to existing windows went. The problem that remains > is that in something like > > WINDOW w1 as (partition by x), > w2 as (partition by x), > w3 as (w1 order by y), > w4 as (w2 order by y) > > w3 and w4 are equivalent but it's pretty hard to recognize that. And > even if we did recognize it, we couldn't simply fold them together into > a single windowClause entry without changing the way that the query > looks on reverse-listing. (The patch as submitted doesn't even *have* > reverse-listing capability for WINDOW clauses, but I plan to fix that > tomorrow ...) > > This is doubtless fixable with another level of indirection, I'm just > wondering how much trouble it's worth. It seems like the only case > that will really arise in practice is duplicate anonymous windows > (identical OVER clauses), and the code does fold those together. Yes, fixable. But I feel that it's not worth as well. To me the spec has been saying the order of window function invocations are not deterministic and they may be separated or not depending on the implementation, though of course it is not what the spec exactly says. But it doesn't say "must" or "should" either. Simply, "Two <window function>s are computed using the same (possibly non-deterministic) window ordering". In the talk about executor design, we have sometimes discussed about separating invocations even they are in the identical window. So I suppose we'd better leave some chances to separate invocations for some future day. And the order of each invocations is relevant with nested window functions talk you gave before. Since window functions, even though they are contained in the identical window, may be invoked in different window or different order, nesting should not be supported obviously. Also I have documented about order of window function invocation somewhere in sgml doc, saying "it is not predictable". If I am asked whether it is said per spec, I cannot find clearly in the spec. But I feel reasonable. Regards, -- Hitoshi Harada