Thread: Re: Use/Abuse of Nulls

Re: Use/Abuse of Nulls

From
"Merlin Moncure"
Date:
Christopher Browne wrote:
The problem I see with the "make another table" approach is that you
wind up with another table for everyone to manage.  More data to join;
more tables to add data to; none of that comes for free, even if it is
cheap, performance-wise.
<snip>

That's true.  I submit though that it is the most straightforward way to
get around the null problem.  The big downside is it encourages the use
of left/right joins which are usually the hardest for the optimizer to
get right (in fact, I almost never use left/right joins, even if they
directly solve the problem, better to use union somehow).

That being said, I usually try and model data considering integrity
first, flexibility second, simplicity third, and performance fourth if
at all.  The reason for that is that I can usually count on SQL wizardry
(either my own or others!) to deal with nasty performance issues.  If
all else fails, I resort to a hack like a lookup table or something of
that kind.  In fact, the reason why I love pg so much is that I've
learned to trust the database to allow me to set up the data the way *I*
want to without making compromises.  This helps a lot in developing
projects.

Regards,
Merlin

Re: Use/Abuse of Nulls

From
Josh Berkus
Date:
Folks,

First off, we should probably really be having this discussion on the SQL
list.

Well, there are two seperate issues with NULLs:

1) The tri-value problem;
2) Abuse of normalization

1) Is the problem that NULLs were implemented in SQL89 to *strictly* mean
"unknown" or "undefined"; that is, values that existed but were not available
to the database.   Unfortunately, the ANSI committee ignored the need for a
"Not Applicable" value despite the rather primitive support for fk
relationships at the time.   This has resulted in people using NULL to
represent *both* "unknown" and "not applicable", meaning that you can't tell
what is actually meant by looking at the NULL.   We really should have had
two values, UNKNOWN and IGNORE.
    Certainly, with text fields it's easy to enforce not-nullness and make the
user select "Unknown" and "Ignore" as string values.  However, it's difficult
to come up with similar values that work for numbers, dates, or network
addresses.

2) Given that the above abuse of NULLs is already built into the SQL standard,
DBAs feel free to further abuse NULLs.  For example, a couple of weeks ago a
developer posted a performance problem to the PERFORM list.   As it turns
out, he had a table with 635 columns, of which 75% were NULL for any given
row (this, BTW, was the source of his performance problem).  While this sort
of not normalized design is not required by the NULL standard, it is made
available and many junior DBAs exploit it.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Use/Abuse of Nulls

From
Christopher Browne
Date:
Centuries ago, Nostradamus foresaw when merlin.moncure@rcsonline.com ("Merlin Moncure") would write:
> Christopher Browne wrote:
>> The problem I see with the "make another table" approach is that you
>> wind up with another table for everyone to manage.  More data to join;
>> more tables to add data to; none of that comes for free, even if it is
>> cheap, performance-wise.
>> <snip>

> That's true.  I submit though that it is the most straightforward
> way to get around the null problem.  The big downside is it
> encourages the use of left/right joins which are usually the hardest
> for the optimizer to get right (in fact, I almost never use
> left/right joins, even if they directly solve the problem, better to
> use union somehow).

One way or the other, there's a problem of managing complexity.

The more "nullable things" that there are, the more complex your model
is, and it seems to me that this is a fact irrespective of how you
deal with it.

- If you allow NULLs in your data model, the application has to cope
  with that.

- If you forbid NULLs, that pushes the complexity over to the notion
  of having to manage additional tables.

In either case, complexity grows.

> That being said, I usually try and model data considering integrity
> first, flexibility second, simplicity third, and performance fourth
> if at all.  The reason for that is that I can usually count on SQL
> wizardry (either my own or others!) to deal with nasty performance
> issues.  If all else fails, I resort to a hack like a lookup table
> or something of that kind.  In fact, the reason why I love pg so
> much is that I've learned to trust the database to allow me to set
> up the data the way *I* want to without making compromises.  This
> helps a lot in developing projects.

That seems like an appropriate way to go irrespective of the tools in
use.

Making performance "Job #4" is about right because it is very likely
that the performance bottlenecks will fall in very specific places.

Here are all of my .fortunes about optimization; they all fit with the
principle of putting optimization off until you KNOW what needs to be
made more efficient.

"Optimization hinders evolution."  -- Alan Perlis

"It is easier  to optimize correct code, than  correct optimized code"
-- Yves Deville

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil."  -- Donald Knuth

"Rules of Optimization:
     Rule 1: Don't do it.
     Rule 2 (for experts only): Don't do it yet."
-- M.A. Jackson

"More computing sins are committed  in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity."  -- W.A. Wulf
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://cbbrowne.com/info/languages.html
Black holes are where God divided by zero.