Thread: Weirdness (bug?) with aggregates and subqueries

Weirdness (bug?) with aggregates and subqueries

From
Laurenz Albe
Date:
I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/):

  CREATE TABLE aa (a INT);
  INSERT INTO aa VALUES (1), (2), (3);
  CREATE TABLE xx (x INT);
  INSERT INTO xx VALUES (10), (20), (30);

  SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;

   sum
  ═════
     6
  (1 row)

Huh?  Shouldn't that return three rows, just like

  SELECT (SELECT sum(14) FROM xx LIMIT 1) FROM aa;

   sum
  ═════
    42
    42
    42
  (3 rows)

Looking at the plan of the weird query, the aggregate seems to be in the wrong place:

  EXPLAIN (COSTS OFF) SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;

            QUERY PLAN
  ══════════════════════════════
   Aggregate
     ->  Seq Scan on aa
     SubPlan 1
       ->  Limit
             ->  Seq Scan on xx
  (5 rows)


And this gives an error:

  SELECT a, (SELECT sum(a) FROM xx LIMIT 1) FROM aa;
  ERROR:  column "aa.a" must appear in the GROUP BY clause or be used in an aggregate function
  LINE 1: SELECT a, (SELECT sum(a) FROM xx LIMIT 1) FROM aa;
                 ^

I think that the optimizer is going astray here...

But perhaps I am missing something obvious.

Yours,
Laurenz Albe



Re: Weirdness (bug?) with aggregates and subqueries

From
Tom Lane
Date:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/):
>   CREATE TABLE aa (a INT);
>   INSERT INTO aa VALUES (1), (2), (3);
>   CREATE TABLE xx (x INT);
>   INSERT INTO xx VALUES (10), (20), (30);

>   SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;

>    sum
>   ═════
>      6
>   (1 row)

> Huh?  Shouldn't that return three rows, just like

No.  The aggregate function is semantically of the closest query level
that contributes a Var to its argument, so it's evaluated at the "FROM
aa" level, causing that level to become an aggregated query that
returns just one row.  Then it acts like an outer reference as far
as the sub-select is concerned.  This is documented at the end of
Section 4.2.7 in our manual,

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

Thank the SQL spec for that weirdness.

            regards, tom lane



Re: Weirdness (bug?) with aggregates and subqueries

From
Laurenz Albe
Date:
On Wed, 2023-11-08 at 16:36 -0500, Tom Lane wrote:
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/):
> >   CREATE TABLE aa (a INT);
> >   INSERT INTO aa VALUES (1), (2), (3);
> >   CREATE TABLE xx (x INT);
> >   INSERT INTO xx VALUES (10), (20), (30);
>
> >   SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;
>
> >    sum
> >   ═════
> >      6
> >   (1 row)
>
> > Huh?  Shouldn't that return three rows, just like
>
> No.  The aggregate function is semantically of the closest query level
> that contributes a Var to its argument, so it's evaluated at the "FROM
> aa" level, causing that level to become an aggregated query that
> returns just one row.  Then it acts like an outer reference as far
> as the sub-select is concerned.  This is documented at the end of
> Section 4.2.7 in our manual,
>
> https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES
>
> Thank the SQL spec for that weirdness.

Thanks for the explanation.  Seems like another instance of the standard
committee smoking the wrong stuff.

Yours,
Laurenz Albe