Thread: SUM() and GROUP BY
Does this seem right? druid=> SELECT COUNT(*) FROM acctrans; count ----- 0 (1 row) druid=> SELECT client_id, SUM(tramount) FROM acctrans GROUP BY client_id; client_id|sum ---------+--- | (1 row) If there are no rows in the table then shouldn't the result be no rows when GROUP BY is used? Further, What about this? druid=> SELECT SUM(tramount) FROM acctrans; sum --- (1 row) Shouldn't that be 0.00? What will the NUMERIC or DECIMAL types do in these situations? It looks like INTEGER has the same behaviour as MONEY (which tramount is.) -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.
> Does this seem right? > druid=> SELECT client_id, SUM(tramount) > FROM acctrans GROUP BY client_id; > client_id|sum > ---------+--- > | > (1 row) > If there are no rows in the table then shouldn't the result be no rows > when GROUP BY is used? Further, What about this? Not sure. Someone may want to try this query on another DB. I know the answer to the next one though... > druid=> SELECT SUM(tramount) FROM acctrans; > sum > --- > > (1 row) > Shouldn't that be 0.00? No. It is returning NULL, because NULL means "don't know". It doesn't mean "nothing" or "zero". That is certainly the correct behavior if the table were populated with all NULLs in that column. And by extension, it is the correct result if there are no rows at all, since "don't know" for a bunch should give the same result as "don't know" for a few or for none. > What will the NUMERIC or DECIMAL types do in these situations? It > looks like INTEGER has the same behaviour as MONEY (which tramount > is.) They will all behave the same. - Tom
D'Arcy J.M. Cain wrote: > > Does this seem right? > > druid=> SELECT COUNT(*) FROM acctrans; > count > ----- > 0 > (1 row) > > druid=> SELECT client_id, SUM(tramount) FROM acctrans GROUP BY client_id; > client_id|sum > ---------+--- > | > (1 row) > > If there are no rows in the table then shouldn't the result be no rows > when GROUP BY is used? Further, What about this? > > druid=> SELECT SUM(tramount) FROM acctrans; > sum > --- > > (1 row) > > Shouldn't that be 0.00? > > What will the NUMERIC or DECIMAL types do in these situations? It > looks like INTEGER has the same behaviour as MONEY (which tramount is.) NUMERIC and DECIMAL will behave exactly as above, because it's the (irritating) correct behaviour. It is handled in the generic grouping and aggregate code (in fact none of the aggregate functions will ever be called if there isn't anything to count/average/sum). To get a zero count, you need a subselect in the targetlist (not implemented yet). Currently the only way to simulate a subselect in the targetlist is to put the count() into an SQL function that takes the arguments you need to build the qualification and returns the counted number. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: >> druid=> SELECT SUM(tramount) FROM acctrans; >> sum >> --- >> >> (1 row) >> Shouldn't that be 0.00? > No. It is returning NULL, because NULL means "don't know". It doesn't > mean "nothing" or "zero". That is certainly the correct behavior if the > table were populated with all NULLs in that column. And by extension, it > is the correct result if there are no rows at all, since "don't know" > for a bunch should give the same result as "don't know" for a few or for > none. I disagree ... the sum of zero items has traditionally been defined as zero by any mathematician you care to ask. No logical problems are introduced by doing so, and it avoids an unpleasant special case that applications would otherwise be forced to deal with. (Example: if D'Arcy's tramount column has been declared NOT NULL, then it seems to me that his code is entitled to expect to get a non-NULL result from SUM(). He should not have to cope with a NULL just because the table is empty.) Now, if the sum is taken over a set of rows that includes at least *one* NULL, then I agree that SUM should return NULL: you "don't know" what the sum ought to be if there are some inputs that you "don't know". But we do know what the sum of no items is: it's zero. There are no unknown inputs and therefore no reason to claim we don't know the sum. IOW, D'Arcy's right. This is a bug. You do have to take it on a case-by-case basis, though. For example AVG() of no items should (and does) return NULL, because the average of no items is not defined (since 0/0 is indeterminate). regards, tom lane
Thus spake Tom Lane > "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: > I disagree ... the sum of zero items has traditionally been defined as > zero by any mathematician you care to ask. No logical problems are > introduced by doing so, and it avoids an unpleasant special case that > applications would otherwise be forced to deal with. (Example: if > D'Arcy's tramount column has been declared NOT NULL, then it seems to me > that his code is entitled to expect to get a non-NULL result from SUM(). > He should not have to cope with a NULL just because the table is empty.) Or, if I add a "WHERE tramount IS NOT NULL" clause to force it to add non-null items, right? It should depend on the values in the selected set, not the values in the full table. > You do have to take it on a case-by-case basis, though. For example > AVG() of no items should (and does) return NULL, because the average > of no items is not defined (since 0/0 is indeterminate). What about GROUP BY though. SELECT client_id, AVG(tramount) FROM acctrans WHERE client_id = 'NOBODY' GROUP BY client_id; It's not that I want a special value returned, I want no rows returned. Currently it returns 1 row even thought 'NOBODY' doesn't exist in the database. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy" "J.M." Cain <darcy@druid.net> writes: > Or, if I add a "WHERE tramount IS NOT NULL" clause to force it to add > non-null items, right? It should depend on the values in the selected > set, not the values in the full table. Sure. No items is no items, no matter how you selected them. > What about GROUP BY though. > > SELECT client_id, AVG(tramount) FROM acctrans > WHERE client_id = 'NOBODY' GROUP BY client_id; > > It's not that I want a special value returned, I want no rows returned. > Currently it returns 1 row even thought 'NOBODY' doesn't exist in the > database. Hmm. I tried: play=> create table empty (i int4); CASE 1: play=> select i from empty; i - (0 rows) No rows returned, certainly correct. CASE 2: play=> select avg(i) from empty; avg --- (1 row) One row containing the null aggregate-function result, also OK. CASE 3: play=> select i from empty group by i; i - (0 rows) No groups, no rows returned, OK. CASE 4: play=> select avg(i) from empty group by i; avg --- (1 row) I suppose you could argue that this is consistent with cases 2 and 3, in a weird way, but it's certainly not the way I'd expect it to work. If there are no groups created by GROUP BY, then AVG should never be invoked at all, therefore there should be no rows returned. This is not the same as case 2, wherein AVG was invoked once over the whole table (which just happened to be empty, causing AVG to return NULL as it should for an empty input set). Here, AVG should be invoked once for each group created by GROUP BY. If no groups, there should be no results, not an artificially created row with a null result. The point is perhaps more obvious if you consider play=> select i,avg(i) from empty group by i; i|avg -+---| (1 row) Here, the system has made up a NULL value for i out of thin air. It will do this even if the i column is declared NOT NULL! I say this behavior is completely unsupportable. (If the table actually *contains* some NULL entries, then returning a row with NULL i is correct --- and indeed the system handles that case properly. GROUP BY's behavior for an empty table is erroneous because it makes it look like the table contains NULL i values when it doesn't.) So, again I agree with D'Arcy. But I'm not the one who might have to try to fix this... regards, tom lane
> I disagree ... the sum of zero items has traditionally been defined as > zero by any mathematician you care to ask. No logical problems are > introduced by doing so, and it avoids an unpleasant special case that > applications would otherwise be forced to deal with. (Example: if > D'Arcy's tramount column has been declared NOT NULL, then it seems to me > that his code is entitled to expect to get a non-NULL result from SUM(). > He should not have to cope with a NULL just because the table is empty.) Informix returns NULL for sum. It returns a zero only for count(). -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I suppose you could argue that this is consistent with cases 2 and 3, > in a weird way, but it's certainly not the way I'd expect it to work. > If there are no groups created by GROUP BY, then AVG should never be > invoked at all, therefore there should be no rows returned. Agreed. > So, again I agree with D'Arcy. I'm missing something. Is there another issue for GROUP BY for which we don't have a consensus? An aggregate on an entire column can return NULL, and aggregates on columns with GROUP BY columns of NULL should not. > But I'm not the one who might have > to try to fix this... And why not? :) - Tom
Thus spake Thomas G. Lockhart > > So, again I agree with D'Arcy. > > I'm missing something. Is there another issue for GROUP BY for which we > don't have a consensus? An aggregate on an entire column can return > NULL, and aggregates on columns with GROUP BY columns of NULL should > not. Well, I still think that an aggregate on a whole column when no rows are selected (None exist or WHERE clause deselects everything) should return 0 rows, not NULL but the above would be better than what happens now. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.