Thread: BUG #18430: syntax error when using aggregate function in where clause of subquery
BUG #18430: syntax error when using aggregate function in where clause of subquery
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18430 Logged by: Eric Atkin Email address: eatkin@certusllc.us PostgreSQL version: 16.2 Operating system: Arch Linux Description: CREATE TABLE driver ( id INTEGER PRIMARY KEY, city TEXT ); CREATE TABLE delivery ( id INTEGER PRIMARY KEY, driver_id INTEGER REFERENCES driver(id) ); SELECT city, (SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id)) AS deliveries FROM driver GROUP BY city ; This produces: ERROR: syntax error at or near "array_agg" LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(... I assume this is because aggregate functions are not normally allowed in a where clause. However, my reading of the last paragraph of Section 4.2.7 of the version 16 docs (quoted below) leads me to believe there should be an exception to that constraint when a subquery occurs in an outer query select list and refers to one of its variables. Is it possible there is a bug in the parser causing it to not be aware of this exception? Perhaps a join of delivery would be a simpler way to write this query in this trivial example, but I think the subquery approach is more reasonable in the real world case I've derived it from. In any case, I think postgres should behave as documented even if I can only poorly demonstrate the issue. Thank you, Eric Atkin 4.2.7 Aggregate Expressions ... When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.23), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query. The aggregate expression as a whole is then an outer reference for the subquery it appears in, and acts as a constant over any one evaluation of that subquery. The restriction about appearing only in the result list or HAVING clause applies with respect to the query level that the aggregate belongs to.
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery
From
Laurenz Albe
Date:
On Fri, 2024-04-12 at 17:14 +0000, PG Bug reporting form wrote: > SELECT > city, > (SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id)) > AS deliveries > FROM driver > GROUP BY city > ; > > This produces: > > ERROR: syntax error at or near "array_agg" > LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(... This not a bug, but bad syntax. Write SELECT city, (SELECT count(*) FROM delivery WHERE driver_id = ANY (drivers)) AS deliveries FROM (SELECT city, array_agg(driver.id) AS drivers FROM driver GROUP BY city) AS q; Yours, Laurenz Albe
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery
From
Eric Atkin
Date:
"The aggregate expression as a whole is then an outer reference
for the subquery it appears in, and acts as a constant over any one
evaluation of that subquery."
evaluation of that subquery."
It seems the aggregate function call should be evaluated at the outer layer (where it would not be bad syntax) and then is a constant for the inner query where clause where an array would be allowed.
Thank you for your working example. There is often more than one way to write any particular query. My version and yours seem to mean the same thing, but one is arguably more readable. At the very least, is this not a documentation bug?
On Fri, Apr 12, 2024 at 11:45 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2024-04-12 at 17:14 +0000, PG Bug reporting form wrote:
> SELECT
> city,
> (SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id))
> AS deliveries
> FROM driver
> GROUP BY city
> ;
>
> This produces:
>
> ERROR: syntax error at or near "array_agg"
> LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(...
This not a bug, but bad syntax.
Write
SELECT city,
(SELECT count(*)
FROM delivery
WHERE driver_id = ANY (drivers)) AS deliveries
FROM (SELECT city,
array_agg(driver.id) AS drivers
FROM driver
GROUP BY city) AS q;
Yours,
Laurenz Albe
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery
From
"David G. Johnston"
Date:
On Fri, Apr 12, 2024 at 10:32 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18430
Logged by: Eric Atkin
Email address: eatkin@certusllc.us
PostgreSQL version: 16.2
Operating system: Arch Linux
Description:
(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id))
This doesn't seem to have anything to do with scoping of inner and outer queries. Where did you come to believe that:
scalar IN array
is a valid operator/expression.
i.e., your query fails for the same reason this one does:
select 1 in array[1,2,3];
David J.
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery
From
Tom Lane
Date:
Eric Atkin <eatkin@certusllc.us> writes: > It seems the aggregate function call should be evaluated at the outer layer > (where it would not be bad syntax) and then is a constant for the inner > query where clause where an array would be allowed. Correct, but the problem is not with the array_agg call, it's with your use of IN. In the first place, IN requires parens around its righthand side. But that only gets us past "syntax error": regression=# SELECT city, (SELECT count(*) FROM delivery WHERE driver_id IN (array_agg(driver.id))) AS deliveries FROM driver GROUP BY city ; ERROR: operator does not exist: integer = integer[] LINE 3: ... (SELECT count(*) FROM delivery WHERE driver_id IN (array_... ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. That's because what's inside the parens is supposed to be either a sub-select or a list of things directly comparable to the LHS. To do what you're after, you need to use the "scalar = ANY(array)" construct: regression=# SELECT city, (SELECT count(*) FROM delivery WHERE driver_id = any (array_agg(driver.id))) AS deliveries FROM driver GROUP BY city ; city | deliveries ------+------------ (0 rows) IN and =ANY are more-or-less equivalent when the RHS is a sub-select, but not for an array RHS. There's nothing particularly wrong with Laurenz's construction, but it's not necessary to split it up like that. regards, tom lane
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery
From
Eric Atkin
Date:
Perfect. I had tried the `= ANY` construct but got the same syntax error and mistakenly assumed it was a problem with the parser rather than my misreading of the `ANY (array)` docs (9.24.3).
Thank you both for your help.
On Fri, Apr 12, 2024 at 1:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eric Atkin <eatkin@certusllc.us> writes:
> It seems the aggregate function call should be evaluated at the outer layer
> (where it would not be bad syntax) and then is a constant for the inner
> query where clause where an array would be allowed.
Correct, but the problem is not with the array_agg call, it's with
your use of IN. In the first place, IN requires parens around its
righthand side. But that only gets us past "syntax error":
regression=# SELECT
city,
(SELECT count(*) FROM delivery WHERE driver_id IN (array_agg(driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
ERROR: operator does not exist: integer = integer[]
LINE 3: ... (SELECT count(*) FROM delivery WHERE driver_id IN (array_...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
That's because what's inside the parens is supposed to be either a
sub-select or a list of things directly comparable to the LHS.
To do what you're after, you need to use the "scalar = ANY(array)"
construct:
regression=# SELECT
city,
(SELECT count(*) FROM delivery WHERE driver_id = any (array_agg(driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
city | deliveries
------+------------
(0 rows)
IN and =ANY are more-or-less equivalent when the RHS is a sub-select,
but not for an array RHS.
There's nothing particularly wrong with Laurenz's construction,
but it's not necessary to split it up like that.
regards, tom lane