Re: Suboptimal evaluation of CASE expressions - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: Suboptimal evaluation of CASE expressions
Date
Msg-id 20060411151736.GD8117@svana.org
Whole thread Raw
In response to Suboptimal evaluation of CASE expressions  (Andreas Tille <tillea@rki.de>)
Responses Re: Suboptimal evaluation of CASE expressions
Re: Suboptimal evaluation of CASE expressions
List pgsql-hackers
On Tue, Apr 11, 2006 at 04:43:33PM +0200, Andreas Tille wrote:
> Hi,
>
> when dealing with a bug in the postgresql-plr interface I think
> I found a suboptimal method to process CASE statements.  First
> to the problem:

<snip>
> SELECT median(nonnull), median(mightbenull) from plrtest where flag = 0;
> ERROR:  R interpreter expression evaluation error
> DETAIL:  Error in median(arg1) : need numeric data
> CONTEXT:  In PL/R function r_median

Because there were no non-null rows, the system passed a NULL to the
final func. Seems you have two ways of dealing with this. Mark the
finalfunc as STRICT so the system won't call it with NULL. Or give the
agrregate an INITCOND which is an empty array. This would also avoid
the NULL.

> I would expect NULL as result of the last query.
>
> So I thought I will verify in a CASE statement whether there
> are only NULL values in the column by max(mightbenull) like this:

<snip>

> The problem I want to discuss here is the following:  Usually in
> programming languages only one branch of the IF-THEN-ELSE statement
> will be calculated.  But here *both* branches are calculated
> (obviousely because of the error that occures).  If we just forget

Usually in programming languages, but not in SQL.

> that my goal was to circumvent the error by some hack, I think
> if there is some kind of complex query in the ELSE branche that
> calculation would just cost extra processing time with no need.
> I would regard this as a bug.

The problem in your example is that you're using aggrgates in the case
statement. Which means that as each row is processed, the aggregates
need to be calculated. It can't shortcut because if it first calculated
the max() and then the median() it would have to evaluate the entire
query twice.

In the general case, PostgreSQL *may* avoid calculating redundant
clauses if it doesn't need to, but you can't rely on it.

Fixing your underlying issue with the aggregate should solve everything
for you.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgsql-hackers by date:

Previous
From: Andreas Tille
Date:
Subject: Suboptimal evaluation of CASE expressions
Next
From: Tom Lane
Date:
Subject: Re: Suboptimal evaluation of CASE expressions