Thread: CASE/WHEN behavior with NULLS
Hello all, I have a query that presents a sum() where in some records it's NULL because all members of the group are NULL. I decided I wanted to see a pretty 0 instead of NULL since it fits the logic of the app. This didn't work as expected (the NULL's persisted): ...CASE sum(foo) WHEN NULL THEN 0 ELSE sum(foo) END... Whereas changing it to: ...CASE WHEN sum(foo) IS NULL THEN 0 ELSE sum(foo) END... it works as expected, substituting the sum()'s that are NULL to zeros. Is that expected behavior? Do i misunderstand how CASE/WHEN works? Running: PostgreSQL 9.1.3 on i686-pc-linux-gnu, compiled by gcc (GCC) 3.4.6, 32-bit TIA, Thalis K.
On Aug 31, 2012, at 19:14, Thalis Kalfigkopoulos <tkalfigo@gmail.com> wrote: > Hello all, > > I have a query that presents a sum() where in some records it's NULL > because all members of the group are NULL. > I decided I wanted to see a pretty 0 instead of NULL since it fits the > logic of the app. > > This didn't work as expected (the NULL's persisted): > ...CASE sum(foo) WHEN NULL THEN 0 ELSE sum(foo) END... Guessing this form effectively evaluates to WHEN sum(foo) = NULL instead of IS NULL and thus the wrong answer: > > Whereas changing it to: > ...CASE WHEN sum(foo) IS NULL THEN 0 ELSE sum(foo) END... > it works as expected, substituting the sum()'s that are NULL to zeros. > > Is that expected behavior? Do i misunderstand how CASE/WHEN works? > Yes. That said you might want to try SUM(COALESCE(foo, 0)) or SUM(case when foo is null then 0 else foo end) Your current attempt does not handle mixed NULL and NOT NULL the way most people would want it to (though maybe you do...) > Running: PostgreSQL 9.1.3 on i686-pc-linux-gnu, compiled by gcc (GCC) > 3.4.6, 32-bit > > TIA, > Thalis K. > > David J
David Johnston <polobo@yahoo.com> writes: > On Aug 31, 2012, at 19:14, Thalis Kalfigkopoulos <tkalfigo@gmail.com> wrote: >> This didn't work as expected (the NULL's persisted): >> ...CASE sum(foo) WHEN NULL THEN 0 ELSE sum(foo) END... > Guessing this form effectively evaluates to > WHEN sum(foo) = NULL instead of IS NULL and thus the wrong answer: Yeah, I think that's right. > That said you might want to try > SUM(COALESCE(foo, 0)) Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only one COALESCE operation, not one per row. regards, tom lane
On Aug 31, 2012, at 21:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Johnston <polobo@yahoo.com> writes: >> On Aug 31, 2012, at 19:14, Thalis Kalfigkopoulos <tkalfigo@gmail.com> wrote: >>> This didn't work as expected (the NULL's persisted): >>> ...CASE sum(foo) WHEN NULL THEN 0 ELSE sum(foo) END... > >> Guessing this form effectively evaluates to >> WHEN sum(foo) = NULL instead of IS NULL and thus the wrong answer: > > Yeah, I think that's right. > >> That said you might want to try >> SUM(COALESCE(foo, 0)) > > Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only > one COALESCE operation, not one per row. > > These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while replacing anynulls with zero. So the decision depends on what and why you are summing. As an alternative for the original question the coalesce(sum(foo),0) form is indeed better. David J.
On Sat, Sep 1, 2012 at 12:07 PM, David Johnston <polobo@yahoo.com> wrote: > These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while replacingany nulls with zero. So the decision depends on what and why you are summing. It comes to the same result with SUM though isn't it? ChrisA
David Johnston <polobo@yahoo.com> writes: > On Aug 31, 2012, at 21:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> David Johnston <polobo@yahoo.com> writes: >>> That said you might want to try >>> SUM(COALESCE(foo, 0)) >> Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only >> one COALESCE operation, not one per row. > These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while replacingany nulls with zero. So the decision depends on what and why you are summing. But SUM() ignores input nulls, so I think they really are equivalent. I agree that in a lot of other cases (for instance MAX), you'd have to think harder about which behavior you wanted. The key point here is that whatever is inside the aggregate function call is computed once per row, and then the aggregate is applied to those results, and then whatever is outside the aggregate is done once on the aggregate's result. SQL's syntax doesn't make this too obvious, but you really have to grasp that to make any sense of what's happening. regards, tom lane
On Aug 31, 2012, at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Johnston <polobo@yahoo.com> writes: >> On Aug 31, 2012, at 21:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> David Johnston <polobo@yahoo.com> writes: >>>> That said you might want to try >>>> SUM(COALESCE(foo, 0)) > >>> Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only >>> one COALESCE operation, not one per row. > >> These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while replacingany nulls with zero. So the decision depends on what and why you are summing. > > But SUM() ignores input nulls, so I think they really are equivalent. > I agree that in a lot of other cases (for instance MAX), you'd have to > think harder about which behavior you wanted. > This I did not know/recall, was assuming nulls poisoned the result. David J.
On Sat, Sep 1, 2012 at 6:19 AM, David Johnston <polobo@yahoo.com> wrote: > On Aug 31, 2012, at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> David Johnston <polobo@yahoo.com> writes: >>> On Aug 31, 2012, at 21:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> David Johnston <polobo@yahoo.com> writes: >>>>> That said you might want to try >>>>> SUM(COALESCE(foo, 0)) >> >>>> Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only >>>> one COALESCE operation, not one per row. >> >>> These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while replacingany nulls with zero. So the decision depends on what and why you are summing. >> >> But SUM() ignores input nulls, so I think they really are equivalent. >> I agree that in a lot of other cases (for instance MAX), you'd have to >> think harder about which behavior you wanted. >> > > This I did not know/recall, was assuming nulls poisoned the result. > > David J. Thanks all for the replies. Actually I had already tested that sum() behaved correctly with respect to NULLs, meaning that it ignored them (or treated them as 0, couldn't really tell). That's why I went ahead sum()ing even though I knew NULLs would always be involved. Unfortunately I didn't see what now seems obvious, that the comparison would be =NULL and not IS NULL. Works fine with coalesc(sum(foo),0). regards, Thalis K.
On 2012-09-01, Chris Angelico <rosuav@gmail.com> wrote: > On Sat, Sep 1, 2012 at 12:07 PM, David Johnston <polobo@yahoo.com> wrote: >> These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while replacingany nulls with zero. So the decision depends on what and why you are summing. > > It comes to the same result with SUM though isn't it? no sum over zero rows always returns null. an external coalesce will make the result 0 whilst an internal coalesce will have no effect and NULL will be the result. with t as ( select 1::int as a where false ) select sum(coalesce (a,0)) as inner,coalesce (sum(a),0) as outer from t; -- ⚂⚃ 100% natural
On 09/01/2012 02:07 PM, Thalis Kalfigkopoulos wrote: > Thanks all for the replies. Actually I had already tested that sum() > behaved correctly with respect to NULLs, meaning that it ignored them > (or treated them as 0, couldn't really tell). That's why I went ahead > sum()ing even though I knew NULLs would always be involved. > Unfortunately I didn't see what now seems obvious, that the comparison > would be =NULL and not IS NULL. The main thing to remember is that there are no consistent rules around NULL. Learn each case and don't try to generalize too much. Think: 1 + 2 + 3 = 6 1 + 2 + NULL = NULL so obviously sum(y) FROM ( VALUES (1),(2),(3) ) x(y) = 6 sum(y) FROM ( VALUES (1),(2),(NULL) ) x(y) = NULL right? No, actually sum() over 1,2,NULL is 3, not NULL. NULL isn't consistent. -- Craig Ringer