Thread: Frames vs partitions: is SQL2008 completely insane?
According to SQL2008 section 7.11 <window clause>, general rule 5, the default definition of window framing in a window that has an ordering clause but no framing (RANGE/ROWS) clause is that the window frame for a given row R runs from the first row of its partition through the last peer of R. Section 6.10's general rules define the results of LEAD, LAG, FIRST_VALUE, LAST_VALUE, NTH_VALUE in terms of the rows available in the window frame of the current window, not its partition. Meanwhile, section 6.10 <window function> syntax rule 6 says that LEAD/LAG must use a window that has an ordering clause and no framing clause. This means that without an explicit framing clause, none of these functions can "look beyond" the last peer of the current row; and what's worse, LEAD/LAG seem to be explicitly forbidden from looking further than that even if we had an implementation of framing clauses. This seems to be less than sane. I would certainly expect that LEAD(x) gives you the next value of x regardless of peer-row status, since LAG(x) gives you the prior value of x regardless of peer row status. It is also simply bizarre for FIRST_VALUE to give you the partition's first row when LAST_VALUE doesn't give you the partition's last row. Are there any errata for SQL2008 yet? Can anyone check the actual behavior of DB2 or other DBMS's that claim to implement these functions? I notice that the current patch code seems to implement first/last/nth_value using the frame, but lead/lag using the partition, which doesn't conform to spec AFAICS ... but lead/lag on the frame doesn't actually appear to be a useful definition so I'd rather go with that than with what the letter of the spec seems to say. Lastly, for a simple aggregate used with an OVER clause, the current patch seems to define the aggregate as being taken over the frame rather than the partition, but I cannot find anything in SQL2008 that lends any support to *either* definition. Comments? This all seems rather badly broken. regards, tom lane
I wrote: > Lastly, for a simple aggregate used with an OVER clause, the current > patch seems to define the aggregate as being taken over the frame > rather than the partition, but I cannot find anything in SQL2008 that > lends any support to *either* definition. Never mind that --- I found it in 10.9 syntax rule 4.b. But what this seems to boil down to is that LEAD() and LAST_VALUE() are completely useless unless you're allowed to specify a nondefault framing clause ... and don't mind ignoring the clearly-insane restriction of 6.10 syntax rule 6.b. The minimum extra functionality needed to make these functions useful would seem to be to allow UNBOUNDED FOLLOWING ... regards, tom lane
2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>: > I notice that the current patch code seems to implement > first/last/nth_value using the frame, but lead/lag using the partition, > which doesn't conform to spec AFAICS ... but lead/lag on the frame > doesn't actually appear to be a useful definition so I'd rather go > with that than with what the letter of the spec seems to say. In 4.15, it says: The lead and lag functions each take three arguments, a <value expression> VE, an <exact numeric literal> OFFSET, and a <value expression> DEFAULT. For each row R within the *window partition P of R* defined by a window structure descriptor, the lag function returns the value of VE evaluated on a row that is OFFSET number of rows before R within P, for lead/lag, and returns the value of VE evaluated on the n-th row from the first (if FROM FIRST is specified or implied) or the last (if FROM LAST is specified) row of the *window frame* of R defined by a window structure descriptor for nth_value, added * by me. I understand lead/lag can affect all rows in the partition whereas first/last/nth_value does only rows in the frame. I guess that's why 6.10 rule 6.b forbids frame caluse in lead/lag but actually we can ignore frame though the frame is specified in the window attached with lead/lag, and it is better if you call aggregate with frame and lead/lag on the same window spec, which allows us to optimize it by calling them on the same node. It violates the spec but we'd better to extend the standard like the offset argument of lead/lag. Regards, -- Hitoshi Harada
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: > 2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>: >> I notice that the current patch code seems to implement >> first/last/nth_value using the frame, but lead/lag using the partition, >> which doesn't conform to spec AFAICS ... > In 4.15, it says: > The lead and lag functions each take three arguments, a <value > expression> VE, an <exact numeric literal> > OFFSET, and a <value expression> DEFAULT. For each row R within the > *window partition P of R* defined by > a window structure descriptor, the lag function returns the value of > VE evaluated on a row that is OFFSET > number of rows before R within P, Well, that's interesting, but I think the controlling definition is in 6.10 general rule 1b, which very clearly states that the frame is to be used for lead/lag (and the adjacent rules say the same for all the other standard window functions). The wording in 4.15 does seem like evidence that the spec authors may have misspoke in 6.10, but we're never going to settle it from the text of the spec. Can anyone check what DB2 and Oracle do here? In any case, both sections agree that last_value works on the frame, which makes it effectively useless with the default frame definition. So I'm still thinking that we need at least a subset of frame support. I'm tempted to propose that we just handle the CURRENT ROW and UNBOUNDED options for each end of the frame. regards, tom lane
2008/12/28 Tom Lane <tgl@sss.pgh.pa.us>: > "Hitoshi Harada" <umi.tanuki@gmail.com> writes: >> 2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>: >>> I notice that the current patch code seems to implement >>> first/last/nth_value using the frame, but lead/lag using the partition, >>> which doesn't conform to spec AFAICS ... > >> In 4.15, it says: > >> The lead and lag functions each take three arguments, a <value >> expression> VE, an <exact numeric literal> >> OFFSET, and a <value expression> DEFAULT. For each row R within the >> *window partition P of R* defined by >> a window structure descriptor, the lag function returns the value of >> VE evaluated on a row that is OFFSET >> number of rows before R within P, > > Well, that's interesting, but I think the controlling definition is in > 6.10 general rule 1b, which very clearly states that the frame is to be > used for lead/lag (and the adjacent rules say the same for all the other > standard window functions). > > The wording in 4.15 does seem like evidence that the spec authors may > have misspoke in 6.10, but we're never going to settle it from the text > of the spec. Can anyone check what DB2 and Oracle do here? I tested on Oracle 10.2.0, and the results are: select depname, empno, salary, lead(salary, 1) over (order by salary), lag(salary, 1) over (order by salary), first_value(salary) over (order by salary), last_value(salary) over (order by salary) from empsalary; DEPNAME EMPNO SALARY LEAD(SALARY,1)OVER(ORDERBYSALARY) LAG(SALARY,1)OVER(ORDERBYSALARY) FIRST_VALUE(SALARY)OVER(ORDERBYSALARY) LAST_VALUE(SALARY)OVER(ORDERBYSALARY) personnel 5 3500 3900 3500 3500 personnel 2 3900 4200 3500 3500 3900 develop 7 4200 4500 3900 3500 4200 develop 9 4500 4800 4200 3500 4500 sales 4 4800 4800 4500 3500 4800 sales 3 4800 5000 4800 3500 4800 sales 1 5000 5200 4800 3500 5000 develop 10 5200 5200 5000 3500 5200 develop 11 5200 6000 5200 3500 5200 develop 8 6000 5200 3500 6000 which means the section 4.15 is true. Could anyone try DB2? > In any case, both sections agree that last_value works on the frame, > which makes it effectively useless with the default frame definition. > So I'm still thinking that we need at least a subset of frame support. > I'm tempted to propose that we just handle the CURRENT ROW and UNBOUNDED > options for each end of the frame. If we can afford it, I and many users are so glad with it. In my opinion it is not so painful if we had done concrete fundamentals of window functions, but isn't it up to our time for the release? Regards, -- Hitoshi Harada
Hitoshi Harada wrote: > 2008/12/28 Tom Lane <tgl@sss.pgh.pa.us>: >> "Hitoshi Harada" <umi.tanuki@gmail.com> writes: >>> 2008/12/27 Tom Lane <tgl@sss.pgh.pa.us>: >>>> which doesn't conform to spec AFAICS ... >>> ....4.15...says: >> interesting...6.10 general rule 1b, which very clearly states ... >> ... 4.15 does seem like evidence that the spec authors may >> have misspoke in 6.10 > Oracle... results are: ... which means the section 4.15 is true ISTM ISO should hire you guys (or the postgres project as a whole) to proof-read their specs before they publish them.
Hitoshi Harada wrote: > I tested on Oracle 10.2.0, and the results are: > > select depname, empno, salary, > lead(salary, 1) over (order by salary), > lag(salary, 1) over (order by salary), > first_value(salary) over (order by salary), > last_value(salary) over (order by salary) > from empsalary; > > DEPNAME EMPNO SALARY LEAD(SALARY,1)OVER(ORDERBYSALARY) > LAG(SALARY,1)OVER(ORDERBYSALARY) > FIRST_VALUE(SALARY)OVER(ORDERBYSALARY) > LAST_VALUE(SALARY)OVER(ORDERBYSALARY) > personnel 5 3500 3900 3500 3500 > personnel 2 3900 4200 3500 3500 3900 > develop 7 4200 4500 3900 3500 4200 > develop 9 4500 4800 4200 3500 4500 > sales 4 4800 4800 4500 3500 4800 > sales 3 4800 5000 4800 3500 4800 > sales 1 5000 5200 4800 3500 5000 > develop 10 5200 5200 5000 3500 5200 > develop 11 5200 6000 5200 3500 5200 > develop 8 6000 5200 3500 6000 > > which means the section 4.15 is true. Could anyone try DB2? DB2 9.5 results: Using the empsalary table from the regression test in the patch: select depname, empno, salary, lead(salary, 1) over (order by salary), lag(salary, 1) over (order by salary), first_value(salary) over (order by salary), last_value(salary) over (order by salary) from empsalary; personnel 5 3500 3900 3500 3500 personnel 2 3900 4200 3500 3500 3900 develop 7 4200 4500 3900 3500 4200 develop 9 4500 4800 4200 3500 4500 sales 4 4800 4800 4500 3500 4800 sales 3 4800 5000 4800 3500 4800 sales 1 5000 5200 4800 3500 5000 develop 10 5200 5200 5000 3500 5200 develop 11 5200 6000 5200 3500 5200 develop 8 6000 5200 3500 6000 Which matches with your Oracle results. So either they both got it wrong by one copying the other... <sarcasm> Of course we all know it couldn't be Oracle copying IBM, that would never happen... </sarcasm> David
Hitoshi Harada wrote: > I tested on Oracle 10.2.0, and the results are: > > select depname, empno, salary, > lead(salary, 1) over (order by salary), > lag(salary, 1) over (order by salary), > first_value(salary) over (order by salary), > last_value(salary) over (order by salary) > from empsalary; > > DEPNAME EMPNO SALARY LEAD(SALARY,1)OVER(ORDERBYSALARY) > LAG(SALARY,1)OVER(ORDERBYSALARY) > FIRST_VALUE(SALARY)OVER(ORDERBYSALARY) > LAST_VALUE(SALARY)OVER(ORDERBYSALARY) > personnel 5 3500 3900 3500 3500 > personnel 2 3900 4200 3500 3500 3900 > develop 7 4200 4500 3900 3500 4200 > develop 9 4500 4800 4200 3500 4500 > sales 4 4800 4800 4500 3500 4800 > sales 3 4800 5000 4800 3500 4800 > sales 1 5000 5200 4800 3500 5000 > develop 10 5200 5200 5000 3500 5200 > develop 11 5200 6000 5200 3500 5200 > develop 8 6000 5200 3500 6000 > > which means the section 4.15 is true. Could anyone try DB2? DB2 9.5 results: Using the empsalary table from the regression test in the patch: select depname, empno, salary, lead(salary, 1) over (order by salary), lag(salary, 1) over (order by salary), first_value(salary) over (order by salary), last_value(salary) over (order by salary) from empsalary; personnel 5 3500 3900 3500 3500 personnel 2 3900 4200 3500 3500 3900 develop 7 4200 4500 3900 3500 4200 develop 9 4500 4800 4200 3500 4500 sales 4 4800 4800 4500 3500 4800 sales 3 4800 5000 4800 3500 4800 sales 1 5000 5200 4800 3500 5000 develop 10 5200 5200 5000 3500 5200 develop 11 5200 6000 5200 3500 5200 develop 8 6000 5200 3500 6000 Which matches with your Oracle results. So either they both got it wrong by one copying the other... <sarcasm> Of course we all know it couldn't be Oracle copying IBM, that would never happen... </sarcasm> David
"David Rowley" <dgrowley@gmail.com> writes: > Hitoshi Harada wrote: >> I tested on Oracle 10.2.0, and the results are: >> ... >> which means the section 4.15 is true. Could anyone try DB2? > DB2 9.5 results [ are the same ] OK, good, that means the reference to the frame in 6.10 rule 1b is just a copy-and-pasteo. (I wonder if it got fixed in the final spec?) I guess this makes sense because we also see that 6.10 syntax rule 6b forbids a framing spec on lead/lag, which makes sense if these functions ignore the frame and no sense otherwise. I also realized after more thought that the way to get frame = partition with a default frame spec is to omit any ORDER BY in the window spec. So my concern about LAST_VALUE being useless without framing ability is unfounded, and I withdraw the complaint that we need to put in some minimal framing features. But we'll have to be careful to document all this properly. Also, it does seem that it might be worthwhile to try to ensure that combinations of windows that have the same PARTITION list and empty vs nonempty ORDER BY get optimized well. Right now it's dependent on ordering of the WindowClause whether you pay an extra sort for that case or not. regards, tom lane
On Saturday 27 December 2008 20:32:10 Ron Mayer wrote: > ISTM ISO should hire you guys (or the postgres project as a whole) > to proof-read their specs before they publish them. The way it really works though, effectively, is that vendors hire ISO to publish their specs. Having a few inconsistencies in 2000 pages of language specification with 20 years of legacy around it isn't so bad IMO, considering that there are really only a handful of guys working on this with any intensity. If we cared enough, we could submit these sorts of issues to the committee for clarification or correction. If anyone is convinced enough about this particular case, I can try to relay it and see what happens.
Tom Lane wrote: > According to SQL2008 section 7.11 <window clause>, general rule 5, the > default definition of window framing in a window that has an ordering > clause but no framing (RANGE/ROWS) clause is that the window frame for > a given row R runs from the first row of its partition through the last > peer of R. > > Section 6.10's general rules define the results of LEAD, LAG, > FIRST_VALUE, LAST_VALUE, NTH_VALUE in terms of the rows available in the > window frame of the current window, not its partition. > > Meanwhile, section 6.10 <window function> syntax rule 6 says that > LEAD/LAG must use a window that has an ordering clause and no > framing clause. > > This means that without an explicit framing clause, none of these > functions can "look beyond" the last peer of the current row; and > what's worse, LEAD/LAG seem to be explicitly forbidden from looking > further than that even if we had an implementation of framing clauses. > > This seems to be less than sane. I would certainly expect that LEAD(x) > gives you the next value of x regardless of peer-row status, since > LAG(x) gives you the prior value of x regardless of peer row status. > It is also simply bizarre for FIRST_VALUE to give you the partition's > first row when LAST_VALUE doesn't give you the partition's last row. > > Are there any errata for SQL2008 yet? Can anyone check the actual > behavior of DB2 or other DBMS's that claim to implement these functions? > > I notice that the current patch code seems to implement > first/last/nth_value using the frame, but lead/lag using the partition, > which doesn't conform to spec AFAICS ... but lead/lag on the frame > doesn't actually appear to be a useful definition so I'd rather go > with that than with what the letter of the spec seems to say. > > Lastly, for a simple aggregate used with an OVER clause, the current > patch seems to define the aggregate as being taken over the frame > rather than the partition, but I cannot find anything in SQL2008 that > lends any support to *either* definition. > > Comments? This all seems rather badly broken. Was this dealt with? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Was this dealt with? Yes. regards, tom lane