Some of the Red Hat guys have been trying to work through the NIST SQL
compliance tests. So far they've found several things we already knew
about, and one we didn't:
-- TEST:0434 GROUP BY with HAVING EXISTS-correlated set function! SELECT PNUM, SUM(HOURS) FROM WORKS GROUP BY
PNUM HAVING EXISTS (SELECT PNAME FROM PROJ WHERE PROJ.PNUM = WORKS.PNUM AND
SUM(WORKS.HOURS) > PROJ.BUDGET / 200);
This query is legal according to the test, but Postgres fails with
ERROR: Aggregates not allowed in WHERE clause
The SUM() should be allowed in the sub-SELECT because, according to the
spec, it is actually an aggregate of the outer query --- and so the
whole expression "SUM(WORKS.HOURS)" is effectively an outer reference
for the sub-SELECT.
Now I finally understand why the spec has all that strange verbiage
about outer references in set-function arguments. This is the case
they're talking about. (I don't much like their restriction to a single
outer reference ... seems like it would be appropriate to allow multiple
references as long as they're all from the same outer query level.)
Fixing this looks a tad nasty. The planner can convert simple column
outer references into Params for a subquery, but there is no
infrastructure to handle making larger expressions into Params. Also,
I don't want the planner repeating the work that the parser is going to
have to do to validate correctness of the query --- the parser will need
to understand that the aggregate is an outer reference as a whole, and
the planner shouldn't have to rediscover that for itself. In any case,
it seems that an outer-reference aggregate is a rather different animal
from an aggregate of the current query, and ought to be so labeled in
the parse tree.
I'm thinking of adding an "agglevelsup" field in Aggref nodes that has
semantics similar to "varlevelsup" in Var nodes --- if it's zero then
the aggregate is a regular aggregate of the current level, if it's more
than zero then the aggregate belongs to an outer query that many levels
up. The parser would need to set this field based on what the
aggregate's argument contains. It'd also have to check that the
argument does not contain variables of more than one query level.
Comments?
regards, tom lane