Thread: Why does aggregate query allow select of non-group by or aggregate values?

Why does aggregate query allow select of non-group by or aggregate values?

From

Jack Christensen

Date:

09 December 2011, 18:48:40

CREATE TABLE people(
   id serial PRIMARY KEY,
   name varchar NOT NULL
);

INSERT INTO people(name) VALUES('Adam'), ('Adam'), ('Adam'), ('Bill'),
('Sam'), ('Joe'), ('Joe');

SELECT name, count(*), random()
FROM people
GROUP BY name;


I would expect this query to cause an error because of random(). I ran
into this using an array produced by a subquery as a column in the
select of an aggregate query, but I was able to boil it down to this
contrived example. Shouldn't any expression that is not in the group by
or an aggregate function be rejected?

What am I not understanding?

Thanks.

--
Jack Christensen
jackc@hylesanderson.edu

Re: Why does aggregate query allow select of non-group by or aggregate values?

From

Henry Drexler

Date:

09 December 2011, 18:52:07

On Fri, Dec 9, 2011 at 5:48 PM, Jack Christensen <jackc@hylesanderson.edu> wrote:

CREATE TABLE people(
id serial PRIMARY KEY,
name varchar NOT NULL
);

INSERT INTO people(name) VALUES('Adam'), ('Adam'), ('Adam'), ('Bill'), ('Sam'), ('Joe'), ('Joe');

SELECT name, count(*), random()
FROM people
GROUP BY name;

I would expect this query to cause an error because of random(). I ran into this using an array produced by a subquery as a column in the select of an aggregate query, but I was able to boil it down to this contrived example. Shouldn't any expression that is not in the group by or an aggregate function be rejected?

What am I not understanding?

Thanks.

--
Jack Christensen
jackc@hylesanderson.edu

I don't know the answer, but I would guess that as random() is not known before hand , it has no reason being in the grouped by.

Re: Why does aggregate query allow select of non-group by or aggregate values?

From

"David Johnston"

Date:

09 December 2011, 18:58:18

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jack Christensen
Sent: Friday, December 09, 2011 5:48 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Why does aggregate query allow select of non-group by or
aggregate values?

CREATE TABLE people(
   id serial PRIMARY KEY,
   name varchar NOT NULL
);

INSERT INTO people(name) VALUES('Adam'), ('Adam'), ('Adam'), ('Bill'),
('Sam'), ('Joe'), ('Joe');

SELECT name, count(*), random()
FROM people
GROUP BY name;

I would expect this query to cause an error because of random(). I ran into
this using an array produced by a subquery as a column in the select of an
aggregate query, but I was able to boil it down to this contrived example.
Shouldn't any expression that is not in the group by or an aggregate
function be rejected?

What am I not understanding?

Thanks.

----------------------------------------------------------------------------
-------

Functions are evaluated once for each row that it generated by the
surrounding query.  This is particularly useful if the function in question
takes an aggregate as an input:

SELECT col1,  array_processing_function( ARRAY_AGG( col2 ) )
FROM table
GROUP BY col1;

Without this particular behavior you would need to sub-query.

From a layman's perspective the reason why you cannot use non-aggregates
outside of GROUP BY it that it is ambiguous as to what value to output; with
an uncorrelated function call that is not the case.

David J.

Re: Why does aggregate query allow select of non-group by or aggregate values?

From

Adrian Klaver

Date:

09 December 2011, 18:58:34

On 12/09/2011 02:48 PM, Jack Christensen wrote:
> CREATE TABLE people(
> id serial PRIMARY KEY,
> name varchar NOT NULL
> );
>
> INSERT INTO people(name) VALUES('Adam'), ('Adam'), ('Adam'), ('Bill'),
> ('Sam'), ('Joe'), ('Joe');
>
> SELECT name, count(*), random()
> FROM people
> GROUP BY name;
>
>
> I would expect this query to cause an error because of random(). I ran
> into this using an array produced by a subquery as a column in the
> select of an aggregate query, but I was able to boil it down to this
> contrived example. Shouldn't any expression that is not in the group by
> or an aggregate function be rejected?
>
> What am I not understanding?

http://www.postgresql.org/docs/9.0/interactive/sql-select.html#SQL-GROUPBY
"
Aggregate functions, if any are used, are computed across all rows
making up each group, producing a separate value for each group (whereas
without GROUP BY, an aggregate produces a single value computed across
all the selected rows). When GROUP BY is present, it is not valid for
the SELECT list expressions to refer to ungrouped columns except within
aggregate functions, since there would be more than one possible value
to return for an ungrouped column."

My guess, random() does not refer to a column, so it falls outside the
above criteria.

>
> Thanks.
>

--
Adrian Klaver
adrian.klaver@gmail.com

Re: Why does aggregate query allow select of non-group by or aggregate values?

From

Jack Christensen

Date:

10 December 2011, 16:40:12

On 12/9/2011 4:57 PM, David Johnston wrote:
> Functions are evaluated once for each row that it generated by the
> surrounding query.  This is particularly useful if the function in question
> takes an aggregate as an input:
>
> SELECT col1,  array_processing_function( ARRAY_AGG( col2 ) )
> FROM table
> GROUP BY col1;
>
> Without this particular behavior you would need to sub-query.
>
> > From a layman's perspective the reason why you cannot use non-aggregates
> outside of GROUP BY it that it is ambiguous as to what value to output; with
> an uncorrelated function call that is not the case.
>
> David J.
>
>
>
>
Thanks. This makes sense now. I also went back to the original query
that provoked this question. It had a correlated subquery in the select
statement. I thought that this could yield ambiguous results. But when I
examined it closely, all the correlated fields were included in the
group by of the outer query, and when I tried to use a non-grouped
column from the outer query I correctly got a ERROR: subquery uses
ungrouped column "foo" from outer query

Thanks again.

--
Jack Christensen
jackc@hylesanderson.edu