Thread: Null and Void() - Or, Abandon All Hope Ye Who allow NULLs?

Null and Void() - Or, Abandon All Hope Ye Who allow NULLs?

From
dananrg@yahoo.com
Date:
Date and Pascal hate nulls. One even goes so far as to say that if you
permit NULLs in a database, then the results from *every* query is
suspect. So they turn perform backflips suggesting ways to avoid nulls.
None, so far, seem appealing.

To me, nulls are quite useful in the Real World. For instance, there
may be a lot of immediate value for end users in committing a row that
has a few nulls (e.g. as in not yet collected), rather than slavishly
follow a rule that says Thou Shalt Not commit a row with nulls.

Can't the intelligent practitioner simply proceed carefully with
queries when nulls are involved? With group functions at least, I
believe nulls are ignored. In Oracle, you can use NVL() to force group
functions to recognize.

What about simply doing an IS NULL test in the code when any table that
allows nulls in involved in a query?

What precisely has Date and Pascal's knickers in such a twist? The fact
that ad hoc queries from random, unintelligent users could give strange
results?

What if one has control over the queries performed through a GUI or
application? Doesn't the problem disappear, presuming the programmer is
aware of the issue and the application is well documented?

What are some of the best ways to deal with the issue of nulls
potentially resulting in questionable query results short of
disallowing them? Storing some sort of coded or numeric value that
represents UNKNOWN or NOT APPLICABLE.


Re: Null and Void() - Or, Abandon All Hope Ye Who allow NULLs?

From
dananrg@yahoo.com
Date:
'Scuse my broken english and ungrammatical gibberish in my last
missive. That's what I get for posting before my first cup of coffee
for the day.

I'm serious about nulls though, and am wondering if Date and Pascal,
perhaps, are the gibberish ones on this particular issue. The
insistence on no nulls, anywhere, any time, for any reason, seems--from
a practical vantage--unreasonable.


Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
"Florian G. Pflug"
Date:
dananrg@yahoo.com wrote:
> Date and Pascal hate nulls. One even goes so far as to say that if you
> permit NULLs in a database, then the results from *every* query is
> suspect. So they turn perform backflips suggesting ways to avoid nulls.
> None, so far, seem appealing.
>
> To me, nulls are quite useful in the Real World. For instance, there
> may be a lot of immediate value for end users in committing a row that
> has a few nulls (e.g. as in not yet collected), rather than slavishly
> follow a rule that says Thou Shalt Not commit a row with nulls.
>
> Can't the intelligent practitioner simply proceed carefully with
> queries when nulls are involved? With group functions at least, I
> believe nulls are ignored. In Oracle, you can use NVL() to force group
> functions to recognize.
>
> What about simply doing an IS NULL test in the code when any table that
> allows nulls in involved in a query?
>
> What precisely has Date and Pascal's knickers in such a twist? The fact
> that ad hoc queries from random, unintelligent users could give strange
> results?
>
> What if one has control over the queries performed through a GUI or
> application? Doesn't the problem disappear, presuming the programmer is
> aware of the issue and the application is well documented?
>
> What are some of the best ways to deal with the issue of nulls
> potentially resulting in questionable query results short of
> disallowing them? Storing some sort of coded or numeric value that
> represents UNKNOWN or NOT APPLICABLE.
I'd say the "problem" with NULL values is mainly that they conflict with
some of the ideas of relational theory and relational algebra.
One of the basic ideas of relation theory is that of functional
dependencies. Database normalization (at least according to the theory)
tells you how to deduce a "good" schema, if you know what columns you
need, and what their functional dependencies are.

Of course, the functions described by those functional dependencies are
not required to be defined for every possible value - let's say you have
a function dependency A -> B - meaning that whenever you know the value
of column A, then there is _at_most_ one value for column BNormalization
basically tells you to model that function dependency as a
table containing fields A and B, and make A the primary key.

Now, if there is no B for a specific value of A, then this table will
just not contain a record for this value of A. But if you allow
NULL-values, then suddently there are _two_ different ways to express
"I don't know what B is for this A". You could either have a record with
the A-value in question, and with B null, or you could have _no_ record
with the A-value in question.

So, NULLs IMHO give you flexibility - but at a price. The  price is that
the _same_ information could be stored in different ways - and you can't
really deduce "the correct way" from the schema alone.

For me, the discussion is quite similar to static-typed vs. dynamically
typed languages - the first ones enable the compiler to check your code
more thoroughly, while the second ones often allow you to write more
concise code.

So, I'd say Date and Pascal are right from a _theoretical_ point of view
- null values really cause problems in the context of relational theory.
Those theoretical problems in turn cause practical problems to some
extent - but so do a _lot_ of other things. Take java as an example -
at least until java 1.4 this language has huge theoretical deficiencies
in it's type system, but you can nevertheless write good and
maintainable code in java 1.4. The same is true for SQL with nulls -
there _are_ possibilities to shoot yourself in the foot, but that just
means that developers need to be skilled enough to know about those
pitfalls.


greetings, Florian Pflug

Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
Berend Tober
Date:
Florian G. Pflug wrote:
 > dananrg@yahoo.com wrote:
 >
 >> Date and Pascal hate nulls.
 >
 > ...the functions described by those functional dependencies are
 > not required to be defined for every possible value - let's say you have
 > a function dependency A -> B - meaning that whenever you know the value
 > of column A, then there is _at_most_ one value for column BNormalization
 > basically tells you to model that function dependency as a
 > table containing fields A and B, and make A the primary key.
 >
 > Now, if there is no B for a specific value of A, then this table will
 > just not contain a record for this value of A. But if you allow
 > NULL-values, then suddently there are _two_ different ways to express
 > "I don't know what B is for this A". You could either have a record with
 > the A-value in question, and with B null, or you could have _no_ record
 > with the A-value in question.
 >

But in the former case, you affirm the existence and your knowledge of
the second A-value; in the latter case you affirm ignorance of the
second A-value. The two-column example may be useful for theoretical
discussion, but in practise likely more columns exist so that NULL can
represent incomplete data that may be determined later for a particular
row when you still need to commit the column values already known. For
instance, in response to customer demands, it may be required that a new
employee begins work on projects right away, even though we have only
basic identifying information, like say, their name. This gives us
enough to create a new employee row, start recording their labor hours
worked for billing purposes, and to cut checks for travel expenses. We
eventually need date of birth, social security number, and other
information, but as a practical matter those columns can certainly be
committed NULL initially.

Regards,
Berend Tober
860-767-0700 x118

Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
"Florian G. Pflug"
Date:
Berend Tober wrote:
> Florian G. Pflug wrote:
>  > dananrg@yahoo.com wrote:
>  >
>  >> Date and Pascal hate nulls.
>  >
>  > ...the functions described by those functional dependencies are
>  > not required to be defined for every possible value - let's say you have
>  > a function dependency A -> B - meaning that whenever you know the value
>  > of column A, then there is _at_most_ one value for column BNormalization
>  > basically tells you to model that function dependency as a
>  > table containing fields A and B, and make A the primary key.
>  >
>  > Now, if there is no B for a specific value of A, then this table will
>  > just not contain a record for this value of A. But if you allow
>  > NULL-values, then suddently there are _two_ different ways to express
>  > "I don't know what B is for this A". You could either have a record with
>  > the A-value in question, and with B null, or you could have _no_ record
>  > with the A-value in question.
>  >

> But in the former case, you affirm the existence and your knowledge of
> the second A-value; in the latter case you affirm ignorance of the
> second A-value. The two-column example may be useful for theoretical
> discussion, but in practise likely more columns exist so that NULL can
> represent incomplete data that may be determined later for a particular
> row when you still need to commit the column values already known.
I came up with the two-column example because it's the simplest example
possible. For larger tables you _could_ split them into n tables (at
most one per field). If not saying I'd do that - just that it's possible
and that it's basically what Date and Pascal suggest.

 > For
> instance, in response to customer demands, it may be required that a new
> employee begins work on projects right away, even though we have only
> basic identifying information, like say, their name. This gives us
> enough to create a new employee row, start recording their labor hours
> worked for billing purposes, and to cut checks for travel expenses. We
> eventually need date of birth, social security number, and other
> information, but as a practical matter those columns can certainly be
> committed NULL initially.
Well, yes - as I said, using null values gives you more flexibility. But
still, you _can_ shoot yourself in the foot by using them - that's why
it's still good to know why some people oppose them, even if you don't
share their point of view. But of course, "rm -r $(PGDATA)" is a more
efficient way to shoot yourself in the foot, and will probably harm more
then using null ;-)

greetings, Florian Pflug

Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
Andrew Gould
Date:
--- Berend Tober <btober@seaworthysys.com> wrote:

> Florian G. Pflug wrote:
>  > dananrg@yahoo.com wrote:
>  >
>  >> Date and Pascal hate nulls.
>  >
>  > ...the functions described by those functional
> dependencies are
>  > not required to be defined for every possible
> value - let's say you have
>  > a function dependency A -> B - meaning that
> whenever you know the value
>  > of column A, then there is _at_most_ one value
> for column BNormalization
>  > basically tells you to model that function
> dependency as a
>  > table containing fields A and B, and make A the
> primary key.
>  >
>  > Now, if there is no B for a specific value of A,
> then this table will
>  > just not contain a record for this value of A.
> But if you allow
>  > NULL-values, then suddently there are _two_
> different ways to express
>  > "I don't know what B is for this A". You could
> either have a record with
>  > the A-value in question, and with B null, or you
> could have _no_ record
>  > with the A-value in question.
>  >
>
> But in the former case, you affirm the existence and
> your knowledge of
> the second A-value; in the latter case you affirm
> ignorance of the
> second A-value. The two-column example may be useful
> for theoretical
> discussion, but in practise likely more columns
> exist so that NULL can
> represent incomplete data that may be determined
> later for a particular
> row when you still need to commit the column values
> already known. For
> instance, in response to customer demands, it may be
> required that a new
> employee begins work on projects right away, even
> though we have only
> basic identifying information, like say, their name.
> This gives us
> enough to create a new employee row, start recording
> their labor hours
> worked for billing purposes, and to cut checks for
> travel expenses. We
> eventually need date of birth, social security
> number, and other
> information, but as a practical matter those columns
> can certainly be
> committed NULL initially.
>
> Regards,
> Berend Tober
> 860-767-0700 x118
>

Null values should be allowed for any information that
may not be known at the time of data entry.  However,
any data field that falls into this category should
not be required to define the relationships between
tables.  This is a case where the database design must
reflect the limitations of operational processes.

Andrew Gould



Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
Andrew Gould
Date:

--- Andrew Gould <andrewgould@yahoo.com> wrote:

> --- Berend Tober <btober@seaworthysys.com> wrote:
>
> > Florian G. Pflug wrote:
> >  > dananrg@yahoo.com wrote:
> >  >
> >  >> Date and Pascal hate nulls.
> >  >
> >  > ...the functions described by those functional
> > dependencies are
> >  > not required to be defined for every possible
> > value - let's say you have
> >  > a function dependency A -> B - meaning that
> > whenever you know the value
> >  > of column A, then there is _at_most_ one value
> > for column BNormalization
> >  > basically tells you to model that function
> > dependency as a
> >  > table containing fields A and B, and make A the
> > primary key.
> >  >
> >  > Now, if there is no B for a specific value of
> A,
> > then this table will
> >  > just not contain a record for this value of A.
> > But if you allow
> >  > NULL-values, then suddently there are _two_
> > different ways to express
> >  > "I don't know what B is for this A". You could
> > either have a record with
> >  > the A-value in question, and with B null, or
> you
> > could have _no_ record
> >  > with the A-value in question.
> >  >
> >
> > But in the former case, you affirm the existence
> and
> > your knowledge of
> > the second A-value; in the latter case you affirm
> > ignorance of the
> > second A-value. The two-column example may be
> useful
> > for theoretical
> > discussion, but in practise likely more columns
> > exist so that NULL can
> > represent incomplete data that may be determined
> > later for a particular
> > row when you still need to commit the column
> values
> > already known. For
> > instance, in response to customer demands, it may
> be
> > required that a new
> > employee begins work on projects right away, even
> > though we have only
> > basic identifying information, like say, their
> name.
> > This gives us
> > enough to create a new employee row, start
> recording
> > their labor hours
> > worked for billing purposes, and to cut checks for
> > travel expenses. We
> > eventually need date of birth, social security
> > number, and other
> > information, but as a practical matter those
> columns
> > can certainly be
> > committed NULL initially.
> >
> > Regards,
> > Berend Tober
> > 860-767-0700 x118
> >
>
> Null values should be allowed for any information
> that
> may not be known at the time of data entry.
> However,
> any data field that falls into this category should
> not be required to define the relationships between
> tables.  This is a case where the database design
> must
> reflect the limitations of operational processes.
>
> Andrew Gould
>

I need to temper my own response.  I was referring to
relationships between tables where both tables contain
operational data.  The use of reference tables, such
as code lookup tables, is a huge exception to my
comment.

Andrew Gould

Re: Null and Void() - Or,

From
"A.M."
Date:
On Wed, June 28, 2006 5:31 am, dananrg@yahoo.com wrote:
> Date and Pascal hate nulls. One even goes so far as to say that if you
> permit NULLs in a database, then the results from *every* query is suspect.
> So they turn perform backflips suggesting ways to avoid nulls.
> None, so far, seem appealing.

This has been discussed to death on this list and on every other SQL
forum, but since you asked...

To understand NULL, there is a little history that needs to be brought up.
The original relational model proposal by Codd had no provisions for
non-existent data. Mathematical purity is a strong argument against NULL.
Another one is just as simple: "NULL represents the absence of data, so it
is the antithesis of what should be stored in a _data_base."

In Codd's later papers, he comes up with several distinct NULLs
representing different states of unknowledge. Date is vehemently opposed
to NULL for the aforementioned reasons.

NULL is nothing more than a shortcut. SQL logic has to do backflips to
accomodate it- notice how NULL!=NULL- indeed, one NULL can mean a variety
of things even in the same context! ("Bob doesn't know","HR doesn't
know","No one cares","Not applicable", etc.)

In this paper:
http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without-nulls.pdf
Darwen discusses relational design without NULLs (his solution requires
support for distributed keys which PostgreSQL admittedly doesn't support)
but the premise is very simply that data can be partitioned so that the
lack of knowledge is implicit in its absence (which is part of the
relational model- it should be a closed system of truths).

Obviously, for practical purposes, NULL isn't going anywhere fast for SQL
databases, but it is really good to know the background and rationale for
your own and other's design decisions.

I hope this has helped.

-M



Re: Null and Void() - Or,

From
Martijn van Oosterhout
Date:
On Wed, Jun 28, 2006 at 10:43:26AM -0400, A.M. wrote:
> NULL is nothing more than a shortcut. SQL logic has to do backflips to
> accomodate it- notice how NULL!=NULL- indeed, one NULL can mean a variety
> of things even in the same context! ("Bob doesn't know","HR doesn't
> know","No one cares","Not applicable", etc.)

Err, NULL!=NULL => NULL, NULL=NULL => NULL. Comparing NULL with
anything produces NULL, which is neither true nor false. It's straight
three-valued logic. If anything the odd thing is that NULL is
interpreted as false in some contexts.

Yes, there are some places in SQL that need to do strange things with
NULL, but this isn't one of them.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Null and Void() - Or, Abandon All Hope Ye Who allow NULLs?

From
David Fetter
Date:
On Wed, Jun 28, 2006 at 02:31:20AM -0700, dananrg@yahoo.com wrote:
> Date and Pascal hate nulls.  One even goes so far as to say that if
> you permit NULLs in a database, then the results from *every* query
> is suspect.  So they turn perform backflips suggesting ways to avoid
> nulls.  None, so far, seem appealing.

That is part of the story.  Interestingly to me is that the more
stridently they've rejected NULLs, the fuzzier their suggestions have
been about what to do in cases of missing information.

[snip]

> What precisely has Date and Pascal's knickers in such a twist?  The
> fact that ad hoc queries from random, unintelligent users could give
> strange results?

What has their knickers in a twist is what twists the knickers of
every other doctrinaire ideologue: that everybody doesn't "just get"
why their way is far superior, drop everything they're doing, and
change over to the True Way™.  Their "Eat Grapefruit, You Morons"
<http://www.angryflower.com/getthe.gif> tactics don't do anything to
endear them either.

It's good to read what Date, Darwen & Pascal have to say, but only
once so you can recognize the flavor of bamboozlement when some
impressionable youth (of whatever age) has a "revelation" about How
Databases Should Be®.  This way, you can help explain gently that
their "new insight" is neither new nor insightful, and that there are
some good papers <http://www.cs.toronto.edu/~libkin/publ.html> to
read.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
"Tim Hart"
Date:
This reminds me of my favorite Bertrand Russell quote:

"The difference between theory and practice is: in theory there is no
difference, but in practice, there is"

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Florian G. Pflug
Sent: Wednesday, June 28, 2006 8:45 AM
To: Berend Tober
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Null and Void() - Or, Abandon All Hope Ye Who allow

Well, yes - as I said, using null values gives you more flexibility. But
still, you _can_ shoot yourself in the foot by using them - that's why
it's still good to know why some people oppose them, even if you don't
share their point of view. But of course, "rm -r $(PGDATA)" is a more
efficient way to shoot yourself in the foot, and will probably harm more
then using null ;-)

greetings, Florian Pflug

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster



Re: Null and Void() - Or, Abandon All Hope Ye Who allow NULLs?

From
David Fetter
Date:
On Wed, Jun 28, 2006 at 02:43:03AM -0700, dananrg@yahoo.com wrote:
> 'Scuse my broken english and ungrammatical gibberish in my last
> missive.  That's what I get for posting before my first cup of
> coffee for the day.
>
> I'm serious about nulls though, and am wondering if Date and Pascal,
> perhaps, are the gibberish ones on this particular issue.

    Most people, on listening to a string of nonsense, will tend to
    doubt their own sanity before they realize that the person who is
    jabbering at them is really the one with the damaged brain.
                                           Neal Stephenson, The Big U

> The insistence on no nulls, anywhere, any time, for any reason,
> seems--from a practical vantage--unreasonable.

You're right.  It is :)

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
Scott Ribe
Date:
> Can't the intelligent practitioner simply proceed carefully with
> queries when nulls are involved?

Yes. The thing is, getting rid of NULL in the real world requires
decomposing data into so many tables that it would certainly cause more
confusion when it comes time to actually query the data...

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
Scott Marlowe
Date:
On Mon, 2006-07-03 at 11:09, Scott Ribe wrote:
> > Can't the intelligent practitioner simply proceed carefully with
> > queries when nulls are involved?
>
> Yes. The thing is, getting rid of NULL in the real world requires
> decomposing data into so many tables that it would certainly cause more
> confusion when it comes time to actually query the data...

I would add that sometimes null means we don't know, but we wish we did,
and here's how we can describe our lack of knowledge...  Those instances
are the ones we would need lots of tables to describe, and infinite time
would allow us to do so.

However, there are often nulls that fall in the category of "who
cares?"  For those, null is a perfectly acceptable alternative, and
there's no need for all the extra work.

Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
Greg Stark
Date:
Scott Marlowe <smarlowe@g2switchworks.com> writes:

> However, there are often nulls that fall in the category of "who
> cares?"  For those, null is a perfectly acceptable alternative, and
> there's no need for all the extra work.

There is often a need for special case values. Situations like "subscription
expiration date" for a subscription that shouldn't expire at all, or even
"income level" for users who refuse to give that information. Also for things
like the various NaN values.

I kind of wish SQL allowed for an arbitrary set of "special values" regardless
of data type rather than allow a single special value and have so many hard
coded magical behaviours.


--
greg

Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
Chris Browne
Date:
gsstark@mit.edu (Greg Stark) writes:
> Scott Marlowe <smarlowe@g2switchworks.com> writes:
>
>> However, there are often nulls that fall in the category of "who
>> cares?"  For those, null is a perfectly acceptable alternative, and
>> there's no need for all the extra work.
>
> There is often a need for special case values. Situations like "subscription
> expiration date" for a subscription that shouldn't expire at all, or even
> "income level" for users who refuse to give that information. Also for things
> like the various NaN values.
>
> I kind of wish SQL allowed for an arbitrary set of "special values" regardless
> of data type rather than allow a single special value and have so many hard
> coded magical behaviours.

In the case of "never expires," there is a well-defined "infinity" value...

mn@[local]:5432=# create table tst (as_at timestamptz);
CREATE TABLE
mn@[local]:5432=# insert into tst values ('infinity');
INSERT 159195836 1
mn@[local]:5432=# insert into tst values (now());
INSERT 159195837 1
mn@[local]:5432=# select * from tst;
             as_at
-------------------------------
 infinity
 2006-07-05 19:35:01.233889+00
(2 rows)

As for having larger numbers of "not there" values, that tends to have
somewhat unfortunate effects on system logic, as code needs to be
aware of additional "special values."
--
output = reverse("gro.mca" "@" "enworbbc")
http://cbbrowne.com/info/emacs.html
"Of course 5  years from now that will be different,  but 5 years from
now  everyone  will  be  running  free  GNU on  their  200  MIPS,  64M
SPARCstation-5."  -- Andrew Tanenbaum, 1992.

Re: Null and Void() - Or, Abandon All Hope Ye Who allow

From
Wayne Conrad
Date:
All good points.

The ability to store NULL, and the fact that there is just one kind of
NULL, seem to parallel what has become common and useful in
programming languages.  Most support NULL at least for pointers, and
many support it for all data types.  It doesn't have to have a defined
meaning to be useful, any more than the number "2" has to have a
defined meaning to be useful.

(Don't think NULL is useful for all types?  Than look for all of the
int functions in a C program that are returning -1 to indicate "not
found," "error," etc.)

In the marketplace of special values, program language designers-- at
least the ones who designed the languages I use--agree that allowing
NULL (and just one kind of NULL) for all data types is the winner.
That alone makes NULL necessary in the database: Because storing and
retrieving data that might include NULL on the program end would be
torture if the database itself did not support NULL.

There's plenty of nits to pick here, I think.  But it's clear to me
that practice has proven NULL to be too useful to ditch.

        Wayne Conrad