Thread: Aggregates containing outer references don't work per spec

Aggregates containing outer references don't work per spec

From
Tom Lane
Date:
Some of the Red Hat guys have been trying to work through the NIST SQL
compliance tests.  So far they've found several things we already knew
about, and one we didn't:

-- TEST:0434 GROUP BY with HAVING EXISTS-correlated set function!  SELECT PNUM, SUM(HOURS) FROM WORKS         GROUP BY
PNUM        HAVING EXISTS (SELECT PNAME FROM PROJ                        WHERE PROJ.PNUM = WORKS.PNUM AND
              SUM(WORKS.HOURS) > PROJ.BUDGET / 200);
 

This query is legal according to the test, but Postgres fails with
ERROR:  Aggregates not allowed in WHERE clause

The SUM() should be allowed in the sub-SELECT because, according to the
spec, it is actually an aggregate of the outer query --- and so the
whole expression "SUM(WORKS.HOURS)" is effectively an outer reference
for the sub-SELECT.

Now I finally understand why the spec has all that strange verbiage
about outer references in set-function arguments. This is the case
they're talking about.  (I don't much like their restriction to a single
outer reference ... seems like it would be appropriate to allow multiple
references as long as they're all from the same outer query level.)

Fixing this looks a tad nasty.  The planner can convert simple column
outer references into Params for a subquery, but there is no
infrastructure to handle making larger expressions into Params.  Also,
I don't want the planner repeating the work that the parser is going to
have to do to validate correctness of the query --- the parser will need
to understand that the aggregate is an outer reference as a whole, and
the planner shouldn't have to rediscover that for itself.  In any case,
it seems that an outer-reference aggregate is a rather different animal
from an aggregate of the current query, and ought to be so labeled in
the parse tree.

I'm thinking of adding an "agglevelsup" field in Aggref nodes that has
semantics similar to "varlevelsup" in Var nodes --- if it's zero then
the aggregate is a regular aggregate of the current level, if it's more
than zero then the aggregate belongs to an outer query that many levels
up.  The parser would need to set this field based on what the
aggregate's argument contains.  It'd also have to check that the
argument does not contain variables of more than one query level.

Comments?
        regards, tom lane


Re: Aggregates containing outer references don't work per spec

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> Would

>     SELECT PNUM, SUM(HOURS) FROM WORKS
>            GROUP BY PNUM
>            HAVING EXISTS (SELECT PNAME FROM PROJ
>                           WHERE PROJ.PNUM = WORKS.PNUM AND
>                                 AVG(WORKS.HOURS) > PROJ.MAGIC / 200);
>                                 ^^^

> be legal according to that spec too?

Yes.  The fact that the same aggregate appears in the topmost target
list may be confusing the issue here --- that is *not* relevant to the
semantics of the aggregate in the subquery.

> Then the parser would not only have 
> to identify the uplevel of the aggregate, it'd also have to add a junk 
> aggregate TLE to the outer TL.

Nah, we don't use TLEs for aggregates.  AFAICT the executor will work
perfectly correctly with this example, if we can arrange to migrate the
whole SUM(WORKS.HOURS) expression out of the subquery and put it as one
of the Params passed to the subquery.  The failure is just in the parser
(too stupid to check the query correctly) and the planner (too stupid to
migrate the whole aggregate expression rather than just the WORKS.HOURS
variable reference).
        regards, tom lane


Re: Aggregates containing outer references don't work per

From
Jan Wieck
Date:
Tom Lane wrote:
> Some of the Red Hat guys have been trying to work through the NIST SQL
> compliance tests.  So far they've found several things we already knew
> about, and one we didn't:
> 
> -- TEST:0434 GROUP BY with HAVING EXISTS-correlated set function!
>    SELECT PNUM, SUM(HOURS) FROM WORKS
>           GROUP BY PNUM
>           HAVING EXISTS (SELECT PNAME FROM PROJ
>                          WHERE PROJ.PNUM = WORKS.PNUM AND
>                                SUM(WORKS.HOURS) > PROJ.BUDGET / 200);
> 
> This query is legal according to the test, but Postgres fails with
> ERROR:  Aggregates not allowed in WHERE clause
> 
> The SUM() should be allowed in the sub-SELECT because, according to the
> spec, it is actually an aggregate of the outer query --- and so the
> whole expression "SUM(WORKS.HOURS)" is effectively an outer reference
> for the sub-SELECT.
> [...]
> 
> Comments?

Would
   SELECT PNUM, SUM(HOURS) FROM WORKS          GROUP BY PNUM          HAVING EXISTS (SELECT PNAME FROM PROJ
           WHERE PROJ.PNUM = WORKS.PNUM AND                               AVG(WORKS.HOURS) > PROJ.MAGIC / 200);
                     ^^^
 

be legal according to that spec too? Then the parser would not only have 
to identify the uplevel of the aggregate, it'd also have to add a junk 
aggregate TLE to the outer TL.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: Aggregates containing outer references don't work per

From
Bruce Momjian
Date:
Wow, a subselect in HAVING.  Where do they think up these things.  :-)

Yes, it would be nice to support it, though I am not surprised no one
has asked for this feature yet.  :-)

---------------------------------------------------------------------------

Tom Lane wrote:
> Some of the Red Hat guys have been trying to work through the NIST SQL
> compliance tests.  So far they've found several things we already knew
> about, and one we didn't:
> 
> -- TEST:0434 GROUP BY with HAVING EXISTS-correlated set function!
>    SELECT PNUM, SUM(HOURS) FROM WORKS
>           GROUP BY PNUM
>           HAVING EXISTS (SELECT PNAME FROM PROJ
>                          WHERE PROJ.PNUM = WORKS.PNUM AND
>                                SUM(WORKS.HOURS) > PROJ.BUDGET / 200);
> 
> This query is legal according to the test, but Postgres fails with
> ERROR:  Aggregates not allowed in WHERE clause
> 
> The SUM() should be allowed in the sub-SELECT because, according to the
> spec, it is actually an aggregate of the outer query --- and so the
> whole expression "SUM(WORKS.HOURS)" is effectively an outer reference
> for the sub-SELECT.
> 
> Now I finally understand why the spec has all that strange verbiage
> about outer references in set-function arguments. This is the case
> they're talking about.  (I don't much like their restriction to a single
> outer reference ... seems like it would be appropriate to allow multiple
> references as long as they're all from the same outer query level.)
> 
> Fixing this looks a tad nasty.  The planner can convert simple column
> outer references into Params for a subquery, but there is no
> infrastructure to handle making larger expressions into Params.  Also,
> I don't want the planner repeating the work that the parser is going to
> have to do to validate correctness of the query --- the parser will need
> to understand that the aggregate is an outer reference as a whole, and
> the planner shouldn't have to rediscover that for itself.  In any case,
> it seems that an outer-reference aggregate is a rather different animal
> from an aggregate of the current query, and ought to be so labeled in
> the parse tree.
> 
> I'm thinking of adding an "agglevelsup" field in Aggref nodes that has
> semantics similar to "varlevelsup" in Var nodes --- if it's zero then
> the aggregate is a regular aggregate of the current level, if it's more
> than zero then the aggregate belongs to an outer query that many levels
> up.  The parser would need to set this field based on what the
> aggregate's argument contains.  It'd also have to check that the
> argument does not contain variables of more than one query level.
> 
> Comments?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Aggregates containing outer references don't work per spec

From
Tom Lane
Date:
I wrote:
> Now I finally understand why the spec has all that strange verbiage
> about outer references in set-function arguments. This is the case
> they're talking about.  (I don't much like their restriction to a single
> outer reference ... seems like it would be appropriate to allow multiple
> references as long as they're all from the same outer query level.)

I've been thinking about that some more.  Currently, if a subquery
contains an aggregate expression likeSUM(localvar + outervar)
(where localvar is a variable of the subquery and outervar is a variable
of the parent query), we accept this, and execute it in what seems to me
a perfectly reasonable way: the aggregate is evaluated over all the
appropriate rows of the subquery, taking the outer variable as a
constant for any one evaluation of the subquery.

The spec appears to forbid this case, but I really don't see why.

Obviously the aggregate argument could be more complex, with outer
references from several different levels of outer query, but that
doesn't change anything --- all the outer-reference variables are
constants from the perspective of the subquery, whether they come
from one level up or many levels up.

What we now realize is that the spec says SUM(outervar) ought to be
considered an aggregate evaluated at the outervar's query level, and
then imported *as a whole* as an effective constant for the subquery.

I claim that there should be nothing wrong with interpretingSUM(outervar1 + outervar2)
as an aggregate of the closest outer variable's level, with further-up
variables taken as constants with respect to that level.  This isn't
really a different case, it's the same as "SUM(localvar + outervar)"
from the perspective of that closest outer level.  We are just allowing
the expression to be referenced from within sub-subqueries.  The
constraint that the aggregate must appear in the SELECT targetlist or
HAVING clause applies to the query level that the aggregate belongs to,
but not to lower levels.

In short, I see no reason why the spec should restrict the aggregate's
argument to contain only a single outer reference, or even references
from just a single outer query level.  The behavior is perfectly well
defined without that constraint.  We can accommodate both our historical
behavior and the spec-mandated cases if we interpret multilevel cases
per this sketch.

Anyone see a flaw in this reasoning?
        regards, tom lane


Re: Aggregates containing outer references don't work per spec

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> When we considered outervar1 as a constant, we could do the aggregate in
> the subquery using computations, but when SUM(outervar1) is computed in
> an above query, combining that with anything that is part of different
> query level makes no sense to me because those variables might not even
> exist at the level that aggregate is being computed.

Sure they will.  They can only be outer (up-level) references, else the
parser would not have resolved them in the first place.

If you accept that SUM(localvar + outervar) is a sensible computation,
then I think you must accept that SUM(outervar1 + outervar2) makes sense
too.  It's actually the exact same computation, it's just being
referenced from within a sub-query.
        regards, tom lane


Re: Aggregates containing outer references don't work per

From
Bruce Momjian
Date:
I think the issue is this part:

> What we now realize is that the spec says SUM(outervar) ought to be
> considered an aggregate evaluated at the outervar's query level, and
> then imported *as a whole* as an effective constant for the subquery.
> 
> I claim that there should be nothing wrong with interpreting
>     SUM(outervar1 + outervar2)
> as an aggregate of the closest outer variable's level, with further-up
When we considered outervar1 as a constant, we could do the aggregate in
the subquery using computations, but when SUM(outervar1) is computed in
an above query, combining that with anything that is part of different
query level makes no sense to me because those variables might not even
exist at the level that aggregate is being computed.

So, for example, this:
    SUM(outervar1) + SUM(outervar2)

does make sense to me because those are computable, but if outervar1 and
outervar2 are in different query levels:
SUM(outervar1 + outervar2)

doesn't make sense.  I think most of the aggregates support such
spliting, so maybe they figured people should split apart aggregates
that were unclear.

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > Now I finally understand why the spec has all that strange verbiage
> > about outer references in set-function arguments. This is the case
> > they're talking about.  (I don't much like their restriction to a single
> > outer reference ... seems like it would be appropriate to allow multiple
> > references as long as they're all from the same outer query level.)
> 
> I've been thinking about that some more.  Currently, if a subquery
> contains an aggregate expression like
>     SUM(localvar + outervar)
> (where localvar is a variable of the subquery and outervar is a variable
> of the parent query), we accept this, and execute it in what seems to me
> a perfectly reasonable way: the aggregate is evaluated over all the
> appropriate rows of the subquery, taking the outer variable as a
> constant for any one evaluation of the subquery.
> 
> The spec appears to forbid this case, but I really don't see why.
> 
> Obviously the aggregate argument could be more complex, with outer
> references from several different levels of outer query, but that
> doesn't change anything --- all the outer-reference variables are
> constants from the perspective of the subquery, whether they come
> from one level up or many levels up.
> 
> What we now realize is that the spec says SUM(outervar) ought to be
> considered an aggregate evaluated at the outervar's query level, and
> then imported *as a whole* as an effective constant for the subquery.
> 
> I claim that there should be nothing wrong with interpreting
>     SUM(outervar1 + outervar2)
> as an aggregate of the closest outer variable's level, with further-up
> variables taken as constants with respect to that level.  This isn't
> really a different case, it's the same as "SUM(localvar + outervar)"
> from the perspective of that closest outer level.  We are just allowing
> the expression to be referenced from within sub-subqueries.  The
> constraint that the aggregate must appear in the SELECT targetlist or
> HAVING clause applies to the query level that the aggregate belongs to,
> but not to lower levels.
> 
> In short, I see no reason why the spec should restrict the aggregate's
> argument to contain only a single outer reference, or even references
> from just a single outer query level.  The behavior is perfectly well
> defined without that constraint.  We can accommodate both our historical
> behavior and the spec-mandated cases if we interpret multilevel cases
> per this sketch.
> 
> Anyone see a flaw in this reasoning?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Aggregates containing outer references don't work per spec

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> What is SUM(up1levelvar + up2levelsvar) considered to be? Would that be 
> the same as SUM(localvar + outervar) one level up?

Exactly.  The spec says that SUM(up1levelvar) is the same as
SUM(localvar) one level up, so this seems a natural generalization.
It lets us keep supporting our old behavior in cases where that behavior
is sensible.
        regards, tom lane


Re: Aggregates containing outer references don't work per

From
Jan Wieck
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> When we considered outervar1 as a constant, we could do the aggregate in
>> the subquery using computations, but when SUM(outervar1) is computed in
>> an above query, combining that with anything that is part of different
>> query level makes no sense to me because those variables might not even
>> exist at the level that aggregate is being computed.
> 
> Sure they will.  They can only be outer (up-level) references, else the
> parser would not have resolved them in the first place.
> 
> If you accept that SUM(localvar + outervar) is a sensible computation,
> then I think you must accept that SUM(outervar1 + outervar2) makes sense
> too.  It's actually the exact same computation, it's just being
> referenced from within a sub-query.

What is SUM(up1levelvar + up2levelsvar) considered to be? Would that be 
the same as SUM(localvar + outervar) one level up?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



cvs ETA?

From
Dave Cramer
Date:
Is there any ETA for cvs?

Dave
-- 
Dave Cramer <dave@fastcrypt.com>
fastcrypt



Re: cvs ETA?

From
The Hermit Hacker
Date:
??  cvs itself has been working for ages now ... anoncvs isn't ... or is
that what you are using?

On Thu, 5 Jun 2003, Dave Cramer wrote:

> Is there any ETA for cvs?
>
> Dave
> --
> Dave Cramer <dave@fastcrypt.com>
> fastcrypt
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: cvs ETA?

From
Alvaro Herrera
Date:
On Sun, Jun 08, 2003 at 10:33:03AM -0300, The Hermit Hacker wrote:
> 
> ??  cvs itself has been working for ages now ... anoncvs isn't ... or is
> that what you are using?

I think that'd be it... I'm waiting for CVSup too.

("Cannot connect to cvsup.postgresql.org: Connection refused")

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Thou shalt study thy libraries and strive not to reinvent them without
cause, that thy code may be short and readable and thy days pleasant
and productive. (7th Commandment for C Programmers)


Re: cvs ETA?

From
Kurt Roeckx
Date:
On Sun, Jun 08, 2003 at 10:33:03AM -0300, The Hermit Hacker wrote:
> 
> ??  cvs itself has been working for ages now ... anoncvs isn't ... or is
> that what you are using?

What else would we be using?


Kurt



Re: cvs ETA?

From
"Andrew Dunstan"
Date:
It's what I tried to use over the weekend, so I could start on a doc patch
... restoring it would probably help a number of people.

andrew

Marc wrote:
>
> ??  cvs itself has been working for ages now ... anoncvs isn't ... or
> is that what you are using?
>
> On Thu, 5 Jun 2003, Dave Cramer wrote:
>
>> Is there any ETA for cvs?
>>
>> Dave





Re: cvs ETA?

From
Dave Cramer
Date:
Actually I was only looking for cvs, by the time this message made it to
the list, it was working.

Dave
On Mon, 2003-06-09 at 00:21, Alvaro Herrera wrote:
> On Sun, Jun 08, 2003 at 10:33:03AM -0300, The Hermit Hacker wrote:
> > 
> > ??  cvs itself has been working for ages now ... anoncvs isn't ... or is
> > that what you are using?
> 
> I think that'd be it... I'm waiting for CVSup too.
> 
> ("Cannot connect to cvsup.postgresql.org: Connection refused")
-- 
Dave Cramer <dave@fastcrypt.com>
fastcrypt