Re: Query plan for NOT IN - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Query plan for NOT IN
Date
Msg-id 4ACC61EF020000250002B6B7@gw.wicourts.gov
Whole thread Raw
In response to Re: Query plan for NOT IN  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
Responses Re: Query plan for NOT IN  (Guy Rouillier <guyr-ml1@burntmail.com>)
List pgsql-performance
Grzegorz Jaœkiewicz<gryzman@gmail.com> wrote:
> Guy Rouillier <guyr-ml1@burntmail.com>wrote:
>> Grzegorz Jaœkiewicz wrote:

>>> using nulls as default 'idunno' - is a bad practice

>> I don't understand this point of view.  The concept of null was
>> introduced into the SQL vernacular by Codd and Date expressly to
>> represent unknown values.

> if by default your account balance is 0, you should set it to 0, not
> leave it as null

If your business rules are that a new account is created with a zero
balance and then deposits are made, sure -- insert the account row
with a zero balance, *because you know it to be zero*.  It's been rare
that I've seen anyone err on the side of using NULL in place of a
default for such cases.  Much more common is using, for example, 'NMI'
in the middle name column to denote "No Middle Initial".  Such "magic
values" can cause no end of trouble.

A failing of the SQL standard is that it uses the same mark (NULL) to
show the absence of a value because it is unknown as for the case
where it is known that no value exists (not applicable).  Codd argued
for a distinction there, but it hasn't come to pass, at least in the
standard.  If anyone could suggest a way to support standard syntax
and semantics and add extensions to support this distinction, it might
be another advance that would distinguish PostgreSQL from "less
evolved" products.   :-)

None of that changes the requirement that NOT IN must result in
UNKNOWN if any of the values involved are NULL.  You can't say that my
birthday is not in the set of birthdays for other subscribers to this
list without knowing the birthdays of all subscribers.  This
definition of the operator makes it hard to optimize, but setting
unknown birthdays to some date far in the past or future, to avoid
using NULL, would just result in bogus results for this query as well
as, for example, queries attempting to calculate aggregates on age.

-Kevin

pgsql-performance by date:

Previous
From: Grzegorz Jaśkiewicz
Date:
Subject: Re: Query plan for NOT IN
Next
From: Guy Rouillier
Date:
Subject: Re: Query plan for NOT IN