Thread: SUM() and GROUP BY

SUM() and GROUP BY

From
"D'Arcy" "J.M." Cain
Date:
Does this seem right?

druid=> SELECT COUNT(*) FROM acctrans;
count
-----   0
(1 row)

druid=> SELECT client_id, SUM(tramount) FROM acctrans GROUP BY client_id;
client_id|sum
---------+---        |   
(1 row)

If there are no rows in the table then shouldn't the result be no rows
when GROUP BY is used?  Further, What about this?

druid=> SELECT SUM(tramount) FROM acctrans;
sum
---  
(1 row)

Shouldn't that be 0.00?

What will the NUMERIC or DECIMAL types do in these situations?  It
looks like INTEGER has the same behaviour as MONEY (which tramount is.)

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: [HACKERS] SUM() and GROUP BY

From
"Thomas G. Lockhart"
Date:
> Does this seem right?
> druid=> SELECT client_id, SUM(tramount)
>         FROM acctrans GROUP BY client_id;
> client_id|sum
> ---------+---
>          |
> (1 row)
> If there are no rows in the table then shouldn't the result be no rows
> when GROUP BY is used?  Further, What about this?

Not sure. Someone may want to try this query on another DB. I know the
answer to the next one though...

> druid=> SELECT SUM(tramount) FROM acctrans;
> sum
> ---
> 
> (1 row)
> Shouldn't that be 0.00?

No. It is returning NULL, because NULL means "don't know". It doesn't
mean "nothing" or "zero". That is certainly the correct behavior if the
table were populated with all NULLs in that column. And by extension, it
is the correct result if there are no rows at all, since "don't know"
for a bunch should give the same result as "don't know" for a few or for
none.

> What will the NUMERIC or DECIMAL types do in these situations?  It
> looks like INTEGER has the same behaviour as MONEY (which tramount 
> is.)

They will all behave the same.
                   - Tom


Re: [HACKERS] SUM() and GROUP BY

From
jwieck@debis.com (Jan Wieck)
Date:
D'Arcy J.M. Cain wrote:

>
> Does this seem right?
>
> druid=> SELECT COUNT(*) FROM acctrans;
> count
> -----
>     0
> (1 row)
>
> druid=> SELECT client_id, SUM(tramount) FROM acctrans GROUP BY client_id;
> client_id|sum
> ---------+---
>          |
> (1 row)
>
> If there are no rows in the table then shouldn't the result be no rows
> when GROUP BY is used?  Further, What about this?
>
> druid=> SELECT SUM(tramount) FROM acctrans;
> sum
> ---
>
> (1 row)
>
> Shouldn't that be 0.00?
>
> What will the NUMERIC or DECIMAL types do in these situations?  It
> looks like INTEGER has the same behaviour as MONEY (which tramount is.)

    NUMERIC  and  DECIMAL  will  behave exactly as above, because
    it's the (irritating) correct behaviour. It is handled in the
    generic  grouping  and  aggregate  code  (in fact none of the
    aggregate functions  will  ever  be  called  if  there  isn't
    anything to count/average/sum).

    To  get  a zero count, you need a subselect in the targetlist
    (not implemented yet). Currently the only way to  simulate  a
    subselect in the targetlist is to put the count() into an SQL
    function that takes the  arguments  you  need  to  build  the
    qualification and returns the counted number.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] SUM() and GROUP BY

From
Tom Lane
Date:
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:
>> druid=> SELECT SUM(tramount) FROM acctrans;
>> sum
>> ---
>> 
>> (1 row)
>> Shouldn't that be 0.00?

> No. It is returning NULL, because NULL means "don't know". It doesn't
> mean "nothing" or "zero". That is certainly the correct behavior if the
> table were populated with all NULLs in that column. And by extension, it
> is the correct result if there are no rows at all, since "don't know"
> for a bunch should give the same result as "don't know" for a few or for
> none.

I disagree ... the sum of zero items has traditionally been defined as
zero by any mathematician you care to ask.  No logical problems are
introduced by doing so, and it avoids an unpleasant special case that
applications would otherwise be forced to deal with.  (Example: if
D'Arcy's tramount column has been declared NOT NULL, then it seems to me
that his code is entitled to expect to get a non-NULL result from SUM().
He should not have to cope with a NULL just because the table is empty.)

Now, if the sum is taken over a set of rows that includes at least *one*
NULL, then I agree that SUM should return NULL: you "don't know" what
the sum ought to be if there are some inputs that you "don't know".
But we do know what the sum of no items is: it's zero.  There are no
unknown inputs and therefore no reason to claim we don't know the sum.

IOW, D'Arcy's right.  This is a bug.

You do have to take it on a case-by-case basis, though.  For example
AVG() of no items should (and does) return NULL, because the average
of no items is not defined (since 0/0 is indeterminate).
        regards, tom lane


Re: [HACKERS] SUM() and GROUP BY

From
"D'Arcy" "J.M." Cain
Date:
Thus spake Tom Lane
> "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:
> I disagree ... the sum of zero items has traditionally been defined as
> zero by any mathematician you care to ask.  No logical problems are
> introduced by doing so, and it avoids an unpleasant special case that
> applications would otherwise be forced to deal with.  (Example: if
> D'Arcy's tramount column has been declared NOT NULL, then it seems to me
> that his code is entitled to expect to get a non-NULL result from SUM().
> He should not have to cope with a NULL just because the table is empty.)

Or, if I add a "WHERE tramount IS NOT NULL" clause to force it to add
non-null items, right?  It should depend on the values in the selected
set, not the values in the full table.

> You do have to take it on a case-by-case basis, though.  For example
> AVG() of no items should (and does) return NULL, because the average
> of no items is not defined (since 0/0 is indeterminate).

What about GROUP BY though.

SELECT client_id, AVG(tramount) FROM acctrans    WHERE client_id = 'NOBODY' GROUP BY client_id;

It's not that I want a special value returned, I want no rows returned.
Currently it returns 1 row even thought 'NOBODY' doesn't exist in the
database.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: [HACKERS] SUM() and GROUP BY

From
Tom Lane
Date:
"D'Arcy" "J.M." Cain <darcy@druid.net> writes:
> Or, if I add a "WHERE tramount IS NOT NULL" clause to force it to add
> non-null items, right?  It should depend on the values in the selected
> set, not the values in the full table.

Sure.  No items is no items, no matter how you selected them.

> What about GROUP BY though.
>
> SELECT client_id, AVG(tramount) FROM acctrans
>      WHERE client_id = 'NOBODY' GROUP BY client_id;
>
> It's not that I want a special value returned, I want no rows returned.
> Currently it returns 1 row even thought 'NOBODY' doesn't exist in the
> database.

Hmm.  I tried:

play=> create table empty (i int4);

CASE 1:

play=> select i from empty;
i
-
(0 rows)

No rows returned, certainly correct.

CASE 2:

play=> select avg(i) from empty;
avg
---

(1 row)

One row containing the null aggregate-function result, also OK.

CASE 3:

play=> select i from empty group by i;
i
-
(0 rows)

No groups, no rows returned, OK.

CASE 4:

play=> select avg(i) from empty group by i;
avg
---

(1 row)

I suppose you could argue that this is consistent with cases 2 and 3,
in a weird way, but it's certainly not the way I'd expect it to work.
If there are no groups created by GROUP BY, then AVG should never be
invoked at all, therefore there should be no rows returned.  This is not
the same as case 2, wherein AVG was invoked once over the whole table
(which just happened to be empty, causing AVG to return NULL as it
should for an empty input set).  Here, AVG should be invoked once for
each group created by GROUP BY.  If no groups, there should be no
results, not an artificially created row with a null result.

The point is perhaps more obvious if you consider

play=> select i,avg(i) from empty group by i;
i|avg
-+---|
(1 row)

Here, the system has made up a NULL value for i out of thin air.
It will do this even if the i column is declared NOT NULL!
I say this behavior is completely unsupportable.  (If the table
actually *contains* some NULL entries, then returning a row with
NULL i is correct --- and indeed the system handles that case properly.
GROUP BY's behavior for an empty table is erroneous because it makes
it look like the table contains NULL i values when it doesn't.)

So, again I agree with D'Arcy.  But I'm not the one who might have
to try to fix this...
        regards, tom lane


Re: [HACKERS] SUM() and GROUP BY

From
Bruce Momjian
Date:
> I disagree ... the sum of zero items has traditionally been defined as
> zero by any mathematician you care to ask.  No logical problems are
> introduced by doing so, and it avoids an unpleasant special case that
> applications would otherwise be forced to deal with.  (Example: if
> D'Arcy's tramount column has been declared NOT NULL, then it seems to me
> that his code is entitled to expect to get a non-NULL result from SUM().
> He should not have to cope with a NULL just because the table is empty.)

Informix returns NULL for sum.  It returns a zero only for count().


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] SUM() and GROUP BY

From
"Thomas G. Lockhart"
Date:
> I suppose you could argue that this is consistent with cases 2 and 3,
> in a weird way, but it's certainly not the way I'd expect it to work.
> If there are no groups created by GROUP BY, then AVG should never be
> invoked at all, therefore there should be no rows returned.

Agreed.

> So, again I agree with D'Arcy.

I'm missing something. Is there another issue for GROUP BY for which we
don't have a consensus? An aggregate on an entire column can return
NULL, and aggregates on columns with GROUP BY columns of NULL should
not.

> But I'm not the one who might have
> to try to fix this...

And why not? :)
                   - Tom


Re: [HACKERS] SUM() and GROUP BY

From
"D'Arcy" "J.M." Cain
Date:
Thus spake Thomas G. Lockhart
> > So, again I agree with D'Arcy.
> 
> I'm missing something. Is there another issue for GROUP BY for which we
> don't have a consensus? An aggregate on an entire column can return
> NULL, and aggregates on columns with GROUP BY columns of NULL should
> not.

Well, I still think that an aggregate on a whole column when no rows
are selected (None exist or WHERE clause deselects everything) should
return 0 rows, not NULL but the above would be better than what happens
now.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.