Aggregates containing outer references don't work per spec - Mailing list pgsql-hackers

From Tom Lane
Subject Aggregates containing outer references don't work per spec
Date
Msg-id 27411.1054762426@sss.pgh.pa.us
Whole thread Raw
Responses Re: Aggregates containing outer references don't work per
Re: Aggregates containing outer references don't work per spec
List pgsql-hackers
Some of the Red Hat guys have been trying to work through the NIST SQL
compliance tests.  So far they've found several things we already knew
about, and one we didn't:

-- TEST:0434 GROUP BY with HAVING EXISTS-correlated set function!  SELECT PNUM, SUM(HOURS) FROM WORKS         GROUP BY
PNUM        HAVING EXISTS (SELECT PNAME FROM PROJ                        WHERE PROJ.PNUM = WORKS.PNUM AND
              SUM(WORKS.HOURS) > PROJ.BUDGET / 200);
 

This query is legal according to the test, but Postgres fails with
ERROR:  Aggregates not allowed in WHERE clause

The SUM() should be allowed in the sub-SELECT because, according to the
spec, it is actually an aggregate of the outer query --- and so the
whole expression "SUM(WORKS.HOURS)" is effectively an outer reference
for the sub-SELECT.

Now I finally understand why the spec has all that strange verbiage
about outer references in set-function arguments. This is the case
they're talking about.  (I don't much like their restriction to a single
outer reference ... seems like it would be appropriate to allow multiple
references as long as they're all from the same outer query level.)

Fixing this looks a tad nasty.  The planner can convert simple column
outer references into Params for a subquery, but there is no
infrastructure to handle making larger expressions into Params.  Also,
I don't want the planner repeating the work that the parser is going to
have to do to validate correctness of the query --- the parser will need
to understand that the aggregate is an outer reference as a whole, and
the planner shouldn't have to rediscover that for itself.  In any case,
it seems that an outer-reference aggregate is a rather different animal
from an aggregate of the current query, and ought to be so labeled in
the parse tree.

I'm thinking of adding an "agglevelsup" field in Aggref nodes that has
semantics similar to "varlevelsup" in Var nodes --- if it's zero then
the aggregate is a regular aggregate of the current level, if it's more
than zero then the aggregate belongs to an outer query that many levels
up.  The parser would need to set this field based on what the
aggregate's argument contains.  It'd also have to check that the
argument does not contain variables of more than one query level.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: TODO list
Next
From: Tom Lane
Date:
Subject: Re: Problem trying to implement version 3.0 of the PostgreSQL protocol