Thread: Aggregates containing outer references don't work per spec
Some of the Red Hat guys have been trying to work through the NIST SQL compliance tests. So far they've found several things we already knew about, and one we didn't: -- TEST:0434 GROUP BY with HAVING EXISTS-correlated set function! SELECT PNUM, SUM(HOURS) FROM WORKS GROUP BY PNUM HAVING EXISTS (SELECT PNAME FROM PROJ WHERE PROJ.PNUM = WORKS.PNUM AND SUM(WORKS.HOURS) > PROJ.BUDGET / 200); This query is legal according to the test, but Postgres fails with ERROR: Aggregates not allowed in WHERE clause The SUM() should be allowed in the sub-SELECT because, according to the spec, it is actually an aggregate of the outer query --- and so the whole expression "SUM(WORKS.HOURS)" is effectively an outer reference for the sub-SELECT. Now I finally understand why the spec has all that strange verbiage about outer references in set-function arguments. This is the case they're talking about. (I don't much like their restriction to a single outer reference ... seems like it would be appropriate to allow multiple references as long as they're all from the same outer query level.) Fixing this looks a tad nasty. The planner can convert simple column outer references into Params for a subquery, but there is no infrastructure to handle making larger expressions into Params. Also, I don't want the planner repeating the work that the parser is going to have to do to validate correctness of the query --- the parser will need to understand that the aggregate is an outer reference as a whole, and the planner shouldn't have to rediscover that for itself. In any case, it seems that an outer-reference aggregate is a rather different animal from an aggregate of the current query, and ought to be so labeled in the parse tree. I'm thinking of adding an "agglevelsup" field in Aggref nodes that has semantics similar to "varlevelsup" in Var nodes --- if it's zero then the aggregate is a regular aggregate of the current level, if it's more than zero then the aggregate belongs to an outer query that many levels up. The parser would need to set this field based on what the aggregate's argument contains. It'd also have to check that the argument does not contain variables of more than one query level. Comments? regards, tom lane
Jan Wieck <JanWieck@Yahoo.com> writes: > Would > SELECT PNUM, SUM(HOURS) FROM WORKS > GROUP BY PNUM > HAVING EXISTS (SELECT PNAME FROM PROJ > WHERE PROJ.PNUM = WORKS.PNUM AND > AVG(WORKS.HOURS) > PROJ.MAGIC / 200); > ^^^ > be legal according to that spec too? Yes. The fact that the same aggregate appears in the topmost target list may be confusing the issue here --- that is *not* relevant to the semantics of the aggregate in the subquery. > Then the parser would not only have > to identify the uplevel of the aggregate, it'd also have to add a junk > aggregate TLE to the outer TL. Nah, we don't use TLEs for aggregates. AFAICT the executor will work perfectly correctly with this example, if we can arrange to migrate the whole SUM(WORKS.HOURS) expression out of the subquery and put it as one of the Params passed to the subquery. The failure is just in the parser (too stupid to check the query correctly) and the planner (too stupid to migrate the whole aggregate expression rather than just the WORKS.HOURS variable reference). regards, tom lane
Tom Lane wrote: > Some of the Red Hat guys have been trying to work through the NIST SQL > compliance tests. So far they've found several things we already knew > about, and one we didn't: > > -- TEST:0434 GROUP BY with HAVING EXISTS-correlated set function! > SELECT PNUM, SUM(HOURS) FROM WORKS > GROUP BY PNUM > HAVING EXISTS (SELECT PNAME FROM PROJ > WHERE PROJ.PNUM = WORKS.PNUM AND > SUM(WORKS.HOURS) > PROJ.BUDGET / 200); > > This query is legal according to the test, but Postgres fails with > ERROR: Aggregates not allowed in WHERE clause > > The SUM() should be allowed in the sub-SELECT because, according to the > spec, it is actually an aggregate of the outer query --- and so the > whole expression "SUM(WORKS.HOURS)" is effectively an outer reference > for the sub-SELECT. > [...] > > Comments? Would SELECT PNUM, SUM(HOURS) FROM WORKS GROUP BY PNUM HAVING EXISTS (SELECT PNAME FROM PROJ WHERE PROJ.PNUM = WORKS.PNUM AND AVG(WORKS.HOURS) > PROJ.MAGIC / 200); ^^^ be legal according to that spec too? Then the parser would not only have to identify the uplevel of the aggregate, it'd also have to add a junk aggregate TLE to the outer TL. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Wow, a subselect in HAVING. Where do they think up these things. :-) Yes, it would be nice to support it, though I am not surprised no one has asked for this feature yet. :-) --------------------------------------------------------------------------- Tom Lane wrote: > Some of the Red Hat guys have been trying to work through the NIST SQL > compliance tests. So far they've found several things we already knew > about, and one we didn't: > > -- TEST:0434 GROUP BY with HAVING EXISTS-correlated set function! > SELECT PNUM, SUM(HOURS) FROM WORKS > GROUP BY PNUM > HAVING EXISTS (SELECT PNAME FROM PROJ > WHERE PROJ.PNUM = WORKS.PNUM AND > SUM(WORKS.HOURS) > PROJ.BUDGET / 200); > > This query is legal according to the test, but Postgres fails with > ERROR: Aggregates not allowed in WHERE clause > > The SUM() should be allowed in the sub-SELECT because, according to the > spec, it is actually an aggregate of the outer query --- and so the > whole expression "SUM(WORKS.HOURS)" is effectively an outer reference > for the sub-SELECT. > > Now I finally understand why the spec has all that strange verbiage > about outer references in set-function arguments. This is the case > they're talking about. (I don't much like their restriction to a single > outer reference ... seems like it would be appropriate to allow multiple > references as long as they're all from the same outer query level.) > > Fixing this looks a tad nasty. The planner can convert simple column > outer references into Params for a subquery, but there is no > infrastructure to handle making larger expressions into Params. Also, > I don't want the planner repeating the work that the parser is going to > have to do to validate correctness of the query --- the parser will need > to understand that the aggregate is an outer reference as a whole, and > the planner shouldn't have to rediscover that for itself. In any case, > it seems that an outer-reference aggregate is a rather different animal > from an aggregate of the current query, and ought to be so labeled in > the parse tree. > > I'm thinking of adding an "agglevelsup" field in Aggref nodes that has > semantics similar to "varlevelsup" in Var nodes --- if it's zero then > the aggregate is a regular aggregate of the current level, if it's more > than zero then the aggregate belongs to an outer query that many levels > up. The parser would need to set this field based on what the > aggregate's argument contains. It'd also have to check that the > argument does not contain variables of more than one query level. > > Comments? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
I wrote: > Now I finally understand why the spec has all that strange verbiage > about outer references in set-function arguments. This is the case > they're talking about. (I don't much like their restriction to a single > outer reference ... seems like it would be appropriate to allow multiple > references as long as they're all from the same outer query level.) I've been thinking about that some more. Currently, if a subquery contains an aggregate expression likeSUM(localvar + outervar) (where localvar is a variable of the subquery and outervar is a variable of the parent query), we accept this, and execute it in what seems to me a perfectly reasonable way: the aggregate is evaluated over all the appropriate rows of the subquery, taking the outer variable as a constant for any one evaluation of the subquery. The spec appears to forbid this case, but I really don't see why. Obviously the aggregate argument could be more complex, with outer references from several different levels of outer query, but that doesn't change anything --- all the outer-reference variables are constants from the perspective of the subquery, whether they come from one level up or many levels up. What we now realize is that the spec says SUM(outervar) ought to be considered an aggregate evaluated at the outervar's query level, and then imported *as a whole* as an effective constant for the subquery. I claim that there should be nothing wrong with interpretingSUM(outervar1 + outervar2) as an aggregate of the closest outer variable's level, with further-up variables taken as constants with respect to that level. This isn't really a different case, it's the same as "SUM(localvar + outervar)" from the perspective of that closest outer level. We are just allowing the expression to be referenced from within sub-subqueries. The constraint that the aggregate must appear in the SELECT targetlist or HAVING clause applies to the query level that the aggregate belongs to, but not to lower levels. In short, I see no reason why the spec should restrict the aggregate's argument to contain only a single outer reference, or even references from just a single outer query level. The behavior is perfectly well defined without that constraint. We can accommodate both our historical behavior and the spec-mandated cases if we interpret multilevel cases per this sketch. Anyone see a flaw in this reasoning? regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > When we considered outervar1 as a constant, we could do the aggregate in > the subquery using computations, but when SUM(outervar1) is computed in > an above query, combining that with anything that is part of different > query level makes no sense to me because those variables might not even > exist at the level that aggregate is being computed. Sure they will. They can only be outer (up-level) references, else the parser would not have resolved them in the first place. If you accept that SUM(localvar + outervar) is a sensible computation, then I think you must accept that SUM(outervar1 + outervar2) makes sense too. It's actually the exact same computation, it's just being referenced from within a sub-query. regards, tom lane
I think the issue is this part: > What we now realize is that the spec says SUM(outervar) ought to be > considered an aggregate evaluated at the outervar's query level, and > then imported *as a whole* as an effective constant for the subquery. > > I claim that there should be nothing wrong with interpreting > SUM(outervar1 + outervar2) > as an aggregate of the closest outer variable's level, with further-up When we considered outervar1 as a constant, we could do the aggregate in the subquery using computations, but when SUM(outervar1) is computed in an above query, combining that with anything that is part of different query level makes no sense to me because those variables might not even exist at the level that aggregate is being computed. So, for example, this: SUM(outervar1) + SUM(outervar2) does make sense to me because those are computable, but if outervar1 and outervar2 are in different query levels: SUM(outervar1 + outervar2) doesn't make sense. I think most of the aggregates support such spliting, so maybe they figured people should split apart aggregates that were unclear. --------------------------------------------------------------------------- Tom Lane wrote: > I wrote: > > Now I finally understand why the spec has all that strange verbiage > > about outer references in set-function arguments. This is the case > > they're talking about. (I don't much like their restriction to a single > > outer reference ... seems like it would be appropriate to allow multiple > > references as long as they're all from the same outer query level.) > > I've been thinking about that some more. Currently, if a subquery > contains an aggregate expression like > SUM(localvar + outervar) > (where localvar is a variable of the subquery and outervar is a variable > of the parent query), we accept this, and execute it in what seems to me > a perfectly reasonable way: the aggregate is evaluated over all the > appropriate rows of the subquery, taking the outer variable as a > constant for any one evaluation of the subquery. > > The spec appears to forbid this case, but I really don't see why. > > Obviously the aggregate argument could be more complex, with outer > references from several different levels of outer query, but that > doesn't change anything --- all the outer-reference variables are > constants from the perspective of the subquery, whether they come > from one level up or many levels up. > > What we now realize is that the spec says SUM(outervar) ought to be > considered an aggregate evaluated at the outervar's query level, and > then imported *as a whole* as an effective constant for the subquery. > > I claim that there should be nothing wrong with interpreting > SUM(outervar1 + outervar2) > as an aggregate of the closest outer variable's level, with further-up > variables taken as constants with respect to that level. This isn't > really a different case, it's the same as "SUM(localvar + outervar)" > from the perspective of that closest outer level. We are just allowing > the expression to be referenced from within sub-subqueries. The > constraint that the aggregate must appear in the SELECT targetlist or > HAVING clause applies to the query level that the aggregate belongs to, > but not to lower levels. > > In short, I see no reason why the spec should restrict the aggregate's > argument to contain only a single outer reference, or even references > from just a single outer query level. The behavior is perfectly well > defined without that constraint. We can accommodate both our historical > behavior and the spec-mandated cases if we interpret multilevel cases > per this sketch. > > Anyone see a flaw in this reasoning? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Jan Wieck <JanWieck@Yahoo.com> writes: > What is SUM(up1levelvar + up2levelsvar) considered to be? Would that be > the same as SUM(localvar + outervar) one level up? Exactly. The spec says that SUM(up1levelvar) is the same as SUM(localvar) one level up, so this seems a natural generalization. It lets us keep supporting our old behavior in cases where that behavior is sensible. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: >> When we considered outervar1 as a constant, we could do the aggregate in >> the subquery using computations, but when SUM(outervar1) is computed in >> an above query, combining that with anything that is part of different >> query level makes no sense to me because those variables might not even >> exist at the level that aggregate is being computed. > > Sure they will. They can only be outer (up-level) references, else the > parser would not have resolved them in the first place. > > If you accept that SUM(localvar + outervar) is a sensible computation, > then I think you must accept that SUM(outervar1 + outervar2) makes sense > too. It's actually the exact same computation, it's just being > referenced from within a sub-query. What is SUM(up1levelvar + up2levelsvar) considered to be? Would that be the same as SUM(localvar + outervar) one level up? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Is there any ETA for cvs? Dave -- Dave Cramer <dave@fastcrypt.com> fastcrypt
?? cvs itself has been working for ages now ... anoncvs isn't ... or is that what you are using? On Thu, 5 Jun 2003, Dave Cramer wrote: > Is there any ETA for cvs? > > Dave > -- > Dave Cramer <dave@fastcrypt.com> > fastcrypt > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sun, Jun 08, 2003 at 10:33:03AM -0300, The Hermit Hacker wrote: > > ?? cvs itself has been working for ages now ... anoncvs isn't ... or is > that what you are using? I think that'd be it... I'm waiting for CVSup too. ("Cannot connect to cvsup.postgresql.org: Connection refused") -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Thou shalt study thy libraries and strive not to reinvent them without cause, that thy code may be short and readable and thy days pleasant and productive. (7th Commandment for C Programmers)
On Sun, Jun 08, 2003 at 10:33:03AM -0300, The Hermit Hacker wrote: > > ?? cvs itself has been working for ages now ... anoncvs isn't ... or is > that what you are using? What else would we be using? Kurt
It's what I tried to use over the weekend, so I could start on a doc patch ... restoring it would probably help a number of people. andrew Marc wrote: > > ?? cvs itself has been working for ages now ... anoncvs isn't ... or > is that what you are using? > > On Thu, 5 Jun 2003, Dave Cramer wrote: > >> Is there any ETA for cvs? >> >> Dave
Actually I was only looking for cvs, by the time this message made it to the list, it was working. Dave On Mon, 2003-06-09 at 00:21, Alvaro Herrera wrote: > On Sun, Jun 08, 2003 at 10:33:03AM -0300, The Hermit Hacker wrote: > > > > ?? cvs itself has been working for ages now ... anoncvs isn't ... or is > > that what you are using? > > I think that'd be it... I'm waiting for CVSup too. > > ("Cannot connect to cvsup.postgresql.org: Connection refused") -- Dave Cramer <dave@fastcrypt.com> fastcrypt