Re: [HACKERS] SUM() and GROUP BY - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] SUM() and GROUP BY
Date
Msg-id 18256.916249131@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] SUM() and GROUP BY  ("D'Arcy" "J.M." Cain <darcy@druid.net>)
List pgsql-hackers
"D'Arcy" "J.M." Cain <darcy@druid.net> writes:
> Or, if I add a "WHERE tramount IS NOT NULL" clause to force it to add
> non-null items, right?  It should depend on the values in the selected
> set, not the values in the full table.

Sure.  No items is no items, no matter how you selected them.

> What about GROUP BY though.
>
> SELECT client_id, AVG(tramount) FROM acctrans
>      WHERE client_id = 'NOBODY' GROUP BY client_id;
>
> It's not that I want a special value returned, I want no rows returned.
> Currently it returns 1 row even thought 'NOBODY' doesn't exist in the
> database.

Hmm.  I tried:

play=> create table empty (i int4);

CASE 1:

play=> select i from empty;
i
-
(0 rows)

No rows returned, certainly correct.

CASE 2:

play=> select avg(i) from empty;
avg
---

(1 row)

One row containing the null aggregate-function result, also OK.

CASE 3:

play=> select i from empty group by i;
i
-
(0 rows)

No groups, no rows returned, OK.

CASE 4:

play=> select avg(i) from empty group by i;
avg
---

(1 row)

I suppose you could argue that this is consistent with cases 2 and 3,
in a weird way, but it's certainly not the way I'd expect it to work.
If there are no groups created by GROUP BY, then AVG should never be
invoked at all, therefore there should be no rows returned.  This is not
the same as case 2, wherein AVG was invoked once over the whole table
(which just happened to be empty, causing AVG to return NULL as it
should for an empty input set).  Here, AVG should be invoked once for
each group created by GROUP BY.  If no groups, there should be no
results, not an artificially created row with a null result.

The point is perhaps more obvious if you consider

play=> select i,avg(i) from empty group by i;
i|avg
-+---|
(1 row)

Here, the system has made up a NULL value for i out of thin air.
It will do this even if the i column is declared NOT NULL!
I say this behavior is completely unsupportable.  (If the table
actually *contains* some NULL entries, then returning a row with
NULL i is correct --- and indeed the system handles that case properly.
GROUP BY's behavior for an empty table is erroneous because it makes
it look like the table contains NULL i values when it doesn't.)

So, again I agree with D'Arcy.  But I'm not the one who might have
to try to fix this...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Brook Milligan
Date:
Subject: fix pg_dump to dump sequences created by SERIAL datatype
Next
From: Brook Milligan
Date:
Subject: references to packaged versions of PostgreSQL