Thread: CASE/WHEN behavior with NULLS

CASE/WHEN behavior with NULLS

From
Thalis Kalfigkopoulos
Date:
Hello all,

I have a query that presents a sum() where in some records it's NULL
because all members of the group are NULL.
I decided I wanted to see a pretty 0 instead of NULL since it fits the
logic of the app.

This didn't work as expected (the NULL's persisted):
...CASE sum(foo) WHEN NULL THEN 0 ELSE sum(foo) END...

Whereas changing it to:
...CASE WHEN sum(foo) IS NULL THEN 0 ELSE sum(foo) END...
it works as expected, substituting the sum()'s that are NULL to zeros.

Is that expected behavior? Do i misunderstand how CASE/WHEN works?

Running:  PostgreSQL 9.1.3 on i686-pc-linux-gnu, compiled by gcc (GCC)
3.4.6, 32-bit

TIA,
Thalis K.


Re: CASE/WHEN behavior with NULLS

From
David Johnston
Date:
On Aug 31, 2012, at 19:14, Thalis Kalfigkopoulos <tkalfigo@gmail.com> wrote:

> Hello all,
>
> I have a query that presents a sum() where in some records it's NULL
> because all members of the group are NULL.
> I decided I wanted to see a pretty 0 instead of NULL since it fits the
> logic of the app.
>
> This didn't work as expected (the NULL's persisted):
> ...CASE sum(foo) WHEN NULL THEN 0 ELSE sum(foo) END...

Guessing this form effectively evaluates to

WHEN sum(foo) = NULL instead of IS NULL and thus the wrong answer:


>
> Whereas changing it to:
> ...CASE WHEN sum(foo) IS NULL THEN 0 ELSE sum(foo) END...
> it works as expected, substituting the sum()'s that are NULL to zeros.
>
> Is that expected behavior? Do i misunderstand how CASE/WHEN works?
>

Yes.

That said you might want to try

SUM(COALESCE(foo, 0))

or

SUM(case when foo is null then 0 else foo end)

Your current attempt does not handle mixed NULL and NOT NULL the way most people would want it to (though maybe you
do...)

> Running:  PostgreSQL 9.1.3 on i686-pc-linux-gnu, compiled by gcc (GCC)
> 3.4.6, 32-bit
>
> TIA,
> Thalis K.
>
>

David J


Re: CASE/WHEN behavior with NULLS

From
Tom Lane
Date:
David Johnston <polobo@yahoo.com> writes:
> On Aug 31, 2012, at 19:14, Thalis Kalfigkopoulos <tkalfigo@gmail.com> wrote:
>> This didn't work as expected (the NULL's persisted):
>> ...CASE sum(foo) WHEN NULL THEN 0 ELSE sum(foo) END...

> Guessing this form effectively evaluates to
> WHEN sum(foo) = NULL instead of IS NULL and thus the wrong answer:

Yeah, I think that's right.

> That said you might want to try
> SUM(COALESCE(foo, 0))

Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only
one COALESCE operation, not one per row.

            regards, tom lane


Re: CASE/WHEN behavior with NULLS

From
David Johnston
Date:
On Aug 31, 2012, at 21:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> David Johnston <polobo@yahoo.com> writes:
>> On Aug 31, 2012, at 19:14, Thalis Kalfigkopoulos <tkalfigo@gmail.com> wrote:
>>> This didn't work as expected (the NULL's persisted):
>>> ...CASE sum(foo) WHEN NULL THEN 0 ELSE sum(foo) END...
>
>> Guessing this form effectively evaluates to
>> WHEN sum(foo) = NULL instead of IS NULL and thus the wrong answer:
>
> Yeah, I think that's right.
>
>> That said you might want to try
>> SUM(COALESCE(foo, 0))
>
> Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only
> one COALESCE operation, not one per row.
>
>

These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while replacing
anynulls with zero.  So the decision depends on what and why you are summing.  

As an alternative for the original question the coalesce(sum(foo),0) form is indeed better.

David J.



Re: CASE/WHEN behavior with NULLS

From
Chris Angelico
Date:
On Sat, Sep 1, 2012 at 12:07 PM, David Johnston <polobo@yahoo.com> wrote:
> These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while
replacingany nulls with zero.  So the decision depends on what and why you are summing. 

It comes to the same result with SUM though isn't it?

ChrisA


Re: CASE/WHEN behavior with NULLS

From
Tom Lane
Date:
David Johnston <polobo@yahoo.com> writes:
> On Aug 31, 2012, at 21:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> David Johnston <polobo@yahoo.com> writes:
>>> That said you might want to try
>>> SUM(COALESCE(foo, 0))

>> Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only
>> one COALESCE operation, not one per row.

> These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while
replacingany nulls with zero.  So the decision depends on what and why you are summing.  

But SUM() ignores input nulls, so I think they really are equivalent.
I agree that in a lot of other cases (for instance MAX), you'd have to
think harder about which behavior you wanted.

The key point here is that whatever is inside the aggregate function
call is computed once per row, and then the aggregate is applied to
those results, and then whatever is outside the aggregate is done once
on the aggregate's result.  SQL's syntax doesn't make this too obvious,
but you really have to grasp that to make any sense of what's happening.

            regards, tom lane


Re: CASE/WHEN behavior with NULLS

From
David Johnston
Date:
On Aug 31, 2012, at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> David Johnston <polobo@yahoo.com> writes:
>> On Aug 31, 2012, at 21:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> David Johnston <polobo@yahoo.com> writes:
>>>> That said you might want to try
>>>> SUM(COALESCE(foo, 0))
>
>>> Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only
>>> one COALESCE operation, not one per row.
>
>> These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while
replacingany nulls with zero.  So the decision depends on what and why you are summing.  
>
> But SUM() ignores input nulls, so I think they really are equivalent.
> I agree that in a lot of other cases (for instance MAX), you'd have to
> think harder about which behavior you wanted.
>

This I did not know/recall, was assuming nulls poisoned the result.

David J.

Re: CASE/WHEN behavior with NULLS

From
Thalis Kalfigkopoulos
Date:
On Sat, Sep 1, 2012 at 6:19 AM, David Johnston <polobo@yahoo.com> wrote:
> On Aug 31, 2012, at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> David Johnston <polobo@yahoo.com> writes:
>>> On Aug 31, 2012, at 21:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> David Johnston <polobo@yahoo.com> writes:
>>>>> That said you might want to try
>>>>> SUM(COALESCE(foo, 0))
>>
>>>> Actually I'd go with "COALESCE(SUM(foo), 0)" since that requires only
>>>> one COALESCE operation, not one per row.
>>
>>> These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while
replacingany nulls with zero.  So the decision depends on what and why you are summing. 
>>
>> But SUM() ignores input nulls, so I think they really are equivalent.
>> I agree that in a lot of other cases (for instance MAX), you'd have to
>> think harder about which behavior you wanted.
>>
>
> This I did not know/recall, was assuming nulls poisoned the result.
>
> David J.

Thanks all for the replies. Actually I had already tested that sum()
behaved correctly with respect to NULLs, meaning that it ignored them
(or treated them as 0, couldn't really tell). That's why I went ahead
sum()ing even though I knew NULLs would always be involved.
Unfortunately I didn't see what now seems obvious, that the comparison
would be =NULL and not IS NULL.

Works fine with coalesc(sum(foo),0).

regards,
Thalis K.


Re: CASE/WHEN behavior with NULLS

From
Jasen Betts
Date:
On 2012-09-01, Chris Angelico <rosuav@gmail.com> wrote:
> On Sat, Sep 1, 2012 at 12:07 PM, David Johnston <polobo@yahoo.com> wrote:
>> These are not equivalent if some values of foo are not-null and you want the sum of all non-null values while
replacingany nulls with zero.  So the decision depends on what and why you are summing. 
>
> It comes to the same result with SUM though isn't it?

no

sum over zero rows always returns null.
an external coalesce will make the result 0 whilst an internal coalesce
will have no effect and NULL will be the result.

with t as ( select 1::int as a  where false )
select sum(coalesce (a,0)) as inner,coalesce (sum(a),0) as outer from t;

--
⚂⚃ 100% natural

Re: CASE/WHEN behavior with NULLS

From
Craig Ringer
Date:
On 09/01/2012 02:07 PM, Thalis Kalfigkopoulos wrote:
> Thanks all for the replies. Actually I had already tested that sum()
> behaved correctly with respect to NULLs, meaning that it ignored them
> (or treated them as 0, couldn't really tell). That's why I went ahead
> sum()ing even though I knew NULLs would always be involved.
> Unfortunately I didn't see what now seems obvious, that the comparison
> would be =NULL and not IS NULL.

The main thing to remember is that there are no consistent rules around
NULL. Learn each case and don't try to generalize too much.

Think:

     1 + 2 + 3    = 6
     1 + 2 + NULL = NULL

so obviously

     sum(y) FROM ( VALUES (1),(2),(3) ) x(y) = 6
     sum(y) FROM ( VALUES (1),(2),(NULL) ) x(y) = NULL

right? No, actually sum() over 1,2,NULL is 3, not NULL.

NULL isn't consistent.

--
Craig Ringer