Thread: Weirdness (bug?) with aggregates and subqueries
I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/): CREATE TABLE aa (a INT); INSERT INTO aa VALUES (1), (2), (3); CREATE TABLE xx (x INT); INSERT INTO xx VALUES (10), (20), (30); SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa; sum ═════ 6 (1 row) Huh? Shouldn't that return three rows, just like SELECT (SELECT sum(14) FROM xx LIMIT 1) FROM aa; sum ═════ 42 42 42 (3 rows) Looking at the plan of the weird query, the aggregate seems to be in the wrong place: EXPLAIN (COSTS OFF) SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa; QUERY PLAN ══════════════════════════════ Aggregate -> Seq Scan on aa SubPlan 1 -> Limit -> Seq Scan on xx (5 rows) And this gives an error: SELECT a, (SELECT sum(a) FROM xx LIMIT 1) FROM aa; ERROR: column "aa.a" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT a, (SELECT sum(a) FROM xx LIMIT 1) FROM aa; ^ I think that the optimizer is going astray here... But perhaps I am missing something obvious. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/): > CREATE TABLE aa (a INT); > INSERT INTO aa VALUES (1), (2), (3); > CREATE TABLE xx (x INT); > INSERT INTO xx VALUES (10), (20), (30); > SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa; > sum > ═════ > 6 > (1 row) > Huh? Shouldn't that return three rows, just like No. The aggregate function is semantically of the closest query level that contributes a Var to its argument, so it's evaluated at the "FROM aa" level, causing that level to become an aggregated query that returns just one row. Then it acts like an outer reference as far as the sub-select is concerned. This is documented at the end of Section 4.2.7 in our manual, https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES Thank the SQL spec for that weirdness. regards, tom lane
On Wed, 2023-11-08 at 16:36 -0500, Tom Lane wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: > > I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/): > > CREATE TABLE aa (a INT); > > INSERT INTO aa VALUES (1), (2), (3); > > CREATE TABLE xx (x INT); > > INSERT INTO xx VALUES (10), (20), (30); > > > SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa; > > > sum > > ═════ > > 6 > > (1 row) > > > Huh? Shouldn't that return three rows, just like > > No. The aggregate function is semantically of the closest query level > that contributes a Var to its argument, so it's evaluated at the "FROM > aa" level, causing that level to become an aggregated query that > returns just one row. Then it acts like an outer reference as far > as the sub-select is concerned. This is documented at the end of > Section 4.2.7 in our manual, > > https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES > > Thank the SQL spec for that weirdness. Thanks for the explanation. Seems like another instance of the standard committee smoking the wrong stuff. Yours, Laurenz Albe