Re: COALESCE and NULLIF semantics - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: COALESCE and NULLIF semantics
Date
Msg-id 4AAA3687020000250002ACE8@gw.wicourts.gov
Whole thread Raw
In response to Re: COALESCE and NULLIF semantics  (Sam Mason <sam@samason.me.uk>)
Responses Re: COALESCE and NULLIF semantics
List pgsql-hackers
Sam Mason <sam@samason.me.uk> wrote: 
> On Wed, Sep 09, 2009 at 10:25:34AM -0400, Tom Lane wrote:
>> Now admittedly there's probably not any major technical obstacle to
>> making a runtime conversion happen --- it's merely delayed
>> invocation of the destination type's input function.  But I find it
>> really ugly from a theoretical point of view.  Doing calculations
>> with "unknown" values just seems wrong.
Agreed.  I did say that I didn't actually want to actually turn these
into functions -- I was just putting together a simple demonstration
(usable only for simple test cases) of what I thought the parse-time
behavior should be, to facilitate discussion.  I *was* thinking that
showing that COALESCE could behave that way for simple cases with two
one-line plpgsql functions might show that the semantics weren't
excessively bizarre.
I don't think that the suggestion could be a problem for COALESCE.  In
fact, I think somewhere in another thread, Tom conceded that much, but
(understandably) didn't want that one form of CASE behaving
differently than everything else did.  I'll try to address that.
The first point is that if any of the expressions used for any result
value in one of these predicates is typed, nothing at all would
change.  This is only about the behavior when each result value is
NULL or an untyped literal.  All subsequent comments assume that, to
avoid the tedium of restating it each time.
I don't think explicit CASE predicates in either form would be a
problem, because there is nothing to suggest a connection between a
literal in the expression *which chooses* a result value and a literal
*used as* a result value.
I think that LEAST and GREATEST are a lost cause in terms of changing
much, since there are obviously compares to be made using *some* type
before a value can be derived -- at least if there is more than one
non-NULL value.  Since these are PostgreSQL extensions which don't
even behave consistently with other products' extensions using the
same words, I'm not too concerned about them being "irregular".  (I'm
not sure what the justification for the current behavior would even be
-- since NULL means *unknown*, how can you declare that you know the
greatest or least value in a set of values when any are unknown?  It
seems like these should be named LEAST_KNOWN and GREATEST_KNOWN for
their current semantics.)  In any event, the current behavior is to
treat them as text; I don't think we can improve on that, beyond
perhaps using unknown if all values are NULL, or all but one are NULL
and the remaining one is an untyped literal.  Not sure whether that's
sane or worth it.
NULLIF presents a problem only with two arguments which are *both*
untyped literals.  That case currently resolves to text.  If both are
NULL, or one is NULL and the other is an untyped literal, I don't see
how there is a problem declaring the result type as unknown.  I think
it would be sane to continue using text with two untyped literals. 
This would require users to declare the type of one or both literals
if they want something else.  (Frankly, I've never had a use for
NULLIF; it seems like a kludge which is there to encourage
substitution of magic values for NULL and then allow those magic
values to be transformed back to NULL on demand.  Does anybody who
expects sane behavior really use this?)
Finally, there is one "minor" extension to what I said above.  Any of
these conditional expressions which evaluate to an untyped literal or
NULL would be considered the same as a bare untyped literal or NULL
for all purposes, including their use in an enclosing conditional
expression.  I don't *think* that adds a lot of complexity to the
issue, but I'm not sure on that one.
> It's pretty grim.  This seems to be some strange halfway house on
> the way to real type-inference, with broken semantics to boot.  How
> would it prevent weirdos like:
> 
>   SELECT 'msg'||v, date_trunc('year',v), v+10
>   FROM (SELECT 'hi ho') x(v);
I don't see where what I'm proposing would change the behavior of that
at all.  I'm only proposing parse-time changes for conditional
expressions -- the CASE predicate and its abbreviations.
I have looked at the code where the parser resolves types for these. 
I think it would be within my skill set to produce a patch if others
agree this makes sense; although so far such agreement doesn't seem
too likely.  :-(
-Kevin


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: COPY enhancements
Next
From: Tom Lane
Date:
Subject: Re: COALESCE and NULLIF semantics