Thread: Null and Void() - Or, Abandon All Hope Ye Who allow NULLs?
Date and Pascal hate nulls. One even goes so far as to say that if you permit NULLs in a database, then the results from *every* query is suspect. So they turn perform backflips suggesting ways to avoid nulls. None, so far, seem appealing. To me, nulls are quite useful in the Real World. For instance, there may be a lot of immediate value for end users in committing a row that has a few nulls (e.g. as in not yet collected), rather than slavishly follow a rule that says Thou Shalt Not commit a row with nulls. Can't the intelligent practitioner simply proceed carefully with queries when nulls are involved? With group functions at least, I believe nulls are ignored. In Oracle, you can use NVL() to force group functions to recognize. What about simply doing an IS NULL test in the code when any table that allows nulls in involved in a query? What precisely has Date and Pascal's knickers in such a twist? The fact that ad hoc queries from random, unintelligent users could give strange results? What if one has control over the queries performed through a GUI or application? Doesn't the problem disappear, presuming the programmer is aware of the issue and the application is well documented? What are some of the best ways to deal with the issue of nulls potentially resulting in questionable query results short of disallowing them? Storing some sort of coded or numeric value that represents UNKNOWN or NOT APPLICABLE.
'Scuse my broken english and ungrammatical gibberish in my last missive. That's what I get for posting before my first cup of coffee for the day. I'm serious about nulls though, and am wondering if Date and Pascal, perhaps, are the gibberish ones on this particular issue. The insistence on no nulls, anywhere, any time, for any reason, seems--from a practical vantage--unreasonable.
dananrg@yahoo.com wrote: > Date and Pascal hate nulls. One even goes so far as to say that if you > permit NULLs in a database, then the results from *every* query is > suspect. So they turn perform backflips suggesting ways to avoid nulls. > None, so far, seem appealing. > > To me, nulls are quite useful in the Real World. For instance, there > may be a lot of immediate value for end users in committing a row that > has a few nulls (e.g. as in not yet collected), rather than slavishly > follow a rule that says Thou Shalt Not commit a row with nulls. > > Can't the intelligent practitioner simply proceed carefully with > queries when nulls are involved? With group functions at least, I > believe nulls are ignored. In Oracle, you can use NVL() to force group > functions to recognize. > > What about simply doing an IS NULL test in the code when any table that > allows nulls in involved in a query? > > What precisely has Date and Pascal's knickers in such a twist? The fact > that ad hoc queries from random, unintelligent users could give strange > results? > > What if one has control over the queries performed through a GUI or > application? Doesn't the problem disappear, presuming the programmer is > aware of the issue and the application is well documented? > > What are some of the best ways to deal with the issue of nulls > potentially resulting in questionable query results short of > disallowing them? Storing some sort of coded or numeric value that > represents UNKNOWN or NOT APPLICABLE. I'd say the "problem" with NULL values is mainly that they conflict with some of the ideas of relational theory and relational algebra. One of the basic ideas of relation theory is that of functional dependencies. Database normalization (at least according to the theory) tells you how to deduce a "good" schema, if you know what columns you need, and what their functional dependencies are. Of course, the functions described by those functional dependencies are not required to be defined for every possible value - let's say you have a function dependency A -> B - meaning that whenever you know the value of column A, then there is _at_most_ one value for column BNormalization basically tells you to model that function dependency as a table containing fields A and B, and make A the primary key. Now, if there is no B for a specific value of A, then this table will just not contain a record for this value of A. But if you allow NULL-values, then suddently there are _two_ different ways to express "I don't know what B is for this A". You could either have a record with the A-value in question, and with B null, or you could have _no_ record with the A-value in question. So, NULLs IMHO give you flexibility - but at a price. The price is that the _same_ information could be stored in different ways - and you can't really deduce "the correct way" from the schema alone. For me, the discussion is quite similar to static-typed vs. dynamically typed languages - the first ones enable the compiler to check your code more thoroughly, while the second ones often allow you to write more concise code. So, I'd say Date and Pascal are right from a _theoretical_ point of view - null values really cause problems in the context of relational theory. Those theoretical problems in turn cause practical problems to some extent - but so do a _lot_ of other things. Take java as an example - at least until java 1.4 this language has huge theoretical deficiencies in it's type system, but you can nevertheless write good and maintainable code in java 1.4. The same is true for SQL with nulls - there _are_ possibilities to shoot yourself in the foot, but that just means that developers need to be skilled enough to know about those pitfalls. greetings, Florian Pflug
Florian G. Pflug wrote: > dananrg@yahoo.com wrote: > >> Date and Pascal hate nulls. > > ...the functions described by those functional dependencies are > not required to be defined for every possible value - let's say you have > a function dependency A -> B - meaning that whenever you know the value > of column A, then there is _at_most_ one value for column BNormalization > basically tells you to model that function dependency as a > table containing fields A and B, and make A the primary key. > > Now, if there is no B for a specific value of A, then this table will > just not contain a record for this value of A. But if you allow > NULL-values, then suddently there are _two_ different ways to express > "I don't know what B is for this A". You could either have a record with > the A-value in question, and with B null, or you could have _no_ record > with the A-value in question. > But in the former case, you affirm the existence and your knowledge of the second A-value; in the latter case you affirm ignorance of the second A-value. The two-column example may be useful for theoretical discussion, but in practise likely more columns exist so that NULL can represent incomplete data that may be determined later for a particular row when you still need to commit the column values already known. For instance, in response to customer demands, it may be required that a new employee begins work on projects right away, even though we have only basic identifying information, like say, their name. This gives us enough to create a new employee row, start recording their labor hours worked for billing purposes, and to cut checks for travel expenses. We eventually need date of birth, social security number, and other information, but as a practical matter those columns can certainly be committed NULL initially. Regards, Berend Tober 860-767-0700 x118
Berend Tober wrote: > Florian G. Pflug wrote: > > dananrg@yahoo.com wrote: > > > >> Date and Pascal hate nulls. > > > > ...the functions described by those functional dependencies are > > not required to be defined for every possible value - let's say you have > > a function dependency A -> B - meaning that whenever you know the value > > of column A, then there is _at_most_ one value for column BNormalization > > basically tells you to model that function dependency as a > > table containing fields A and B, and make A the primary key. > > > > Now, if there is no B for a specific value of A, then this table will > > just not contain a record for this value of A. But if you allow > > NULL-values, then suddently there are _two_ different ways to express > > "I don't know what B is for this A". You could either have a record with > > the A-value in question, and with B null, or you could have _no_ record > > with the A-value in question. > > > But in the former case, you affirm the existence and your knowledge of > the second A-value; in the latter case you affirm ignorance of the > second A-value. The two-column example may be useful for theoretical > discussion, but in practise likely more columns exist so that NULL can > represent incomplete data that may be determined later for a particular > row when you still need to commit the column values already known. I came up with the two-column example because it's the simplest example possible. For larger tables you _could_ split them into n tables (at most one per field). If not saying I'd do that - just that it's possible and that it's basically what Date and Pascal suggest. > For > instance, in response to customer demands, it may be required that a new > employee begins work on projects right away, even though we have only > basic identifying information, like say, their name. This gives us > enough to create a new employee row, start recording their labor hours > worked for billing purposes, and to cut checks for travel expenses. We > eventually need date of birth, social security number, and other > information, but as a practical matter those columns can certainly be > committed NULL initially. Well, yes - as I said, using null values gives you more flexibility. But still, you _can_ shoot yourself in the foot by using them - that's why it's still good to know why some people oppose them, even if you don't share their point of view. But of course, "rm -r $(PGDATA)" is a more efficient way to shoot yourself in the foot, and will probably harm more then using null ;-) greetings, Florian Pflug
--- Berend Tober <btober@seaworthysys.com> wrote: > Florian G. Pflug wrote: > > dananrg@yahoo.com wrote: > > > >> Date and Pascal hate nulls. > > > > ...the functions described by those functional > dependencies are > > not required to be defined for every possible > value - let's say you have > > a function dependency A -> B - meaning that > whenever you know the value > > of column A, then there is _at_most_ one value > for column BNormalization > > basically tells you to model that function > dependency as a > > table containing fields A and B, and make A the > primary key. > > > > Now, if there is no B for a specific value of A, > then this table will > > just not contain a record for this value of A. > But if you allow > > NULL-values, then suddently there are _two_ > different ways to express > > "I don't know what B is for this A". You could > either have a record with > > the A-value in question, and with B null, or you > could have _no_ record > > with the A-value in question. > > > > But in the former case, you affirm the existence and > your knowledge of > the second A-value; in the latter case you affirm > ignorance of the > second A-value. The two-column example may be useful > for theoretical > discussion, but in practise likely more columns > exist so that NULL can > represent incomplete data that may be determined > later for a particular > row when you still need to commit the column values > already known. For > instance, in response to customer demands, it may be > required that a new > employee begins work on projects right away, even > though we have only > basic identifying information, like say, their name. > This gives us > enough to create a new employee row, start recording > their labor hours > worked for billing purposes, and to cut checks for > travel expenses. We > eventually need date of birth, social security > number, and other > information, but as a practical matter those columns > can certainly be > committed NULL initially. > > Regards, > Berend Tober > 860-767-0700 x118 > Null values should be allowed for any information that may not be known at the time of data entry. However, any data field that falls into this category should not be required to define the relationships between tables. This is a case where the database design must reflect the limitations of operational processes. Andrew Gould
--- Andrew Gould <andrewgould@yahoo.com> wrote: > --- Berend Tober <btober@seaworthysys.com> wrote: > > > Florian G. Pflug wrote: > > > dananrg@yahoo.com wrote: > > > > > >> Date and Pascal hate nulls. > > > > > > ...the functions described by those functional > > dependencies are > > > not required to be defined for every possible > > value - let's say you have > > > a function dependency A -> B - meaning that > > whenever you know the value > > > of column A, then there is _at_most_ one value > > for column BNormalization > > > basically tells you to model that function > > dependency as a > > > table containing fields A and B, and make A the > > primary key. > > > > > > Now, if there is no B for a specific value of > A, > > then this table will > > > just not contain a record for this value of A. > > But if you allow > > > NULL-values, then suddently there are _two_ > > different ways to express > > > "I don't know what B is for this A". You could > > either have a record with > > > the A-value in question, and with B null, or > you > > could have _no_ record > > > with the A-value in question. > > > > > > > But in the former case, you affirm the existence > and > > your knowledge of > > the second A-value; in the latter case you affirm > > ignorance of the > > second A-value. The two-column example may be > useful > > for theoretical > > discussion, but in practise likely more columns > > exist so that NULL can > > represent incomplete data that may be determined > > later for a particular > > row when you still need to commit the column > values > > already known. For > > instance, in response to customer demands, it may > be > > required that a new > > employee begins work on projects right away, even > > though we have only > > basic identifying information, like say, their > name. > > This gives us > > enough to create a new employee row, start > recording > > their labor hours > > worked for billing purposes, and to cut checks for > > travel expenses. We > > eventually need date of birth, social security > > number, and other > > information, but as a practical matter those > columns > > can certainly be > > committed NULL initially. > > > > Regards, > > Berend Tober > > 860-767-0700 x118 > > > > Null values should be allowed for any information > that > may not be known at the time of data entry. > However, > any data field that falls into this category should > not be required to define the relationships between > tables. This is a case where the database design > must > reflect the limitations of operational processes. > > Andrew Gould > I need to temper my own response. I was referring to relationships between tables where both tables contain operational data. The use of reference tables, such as code lookup tables, is a huge exception to my comment. Andrew Gould
On Wed, June 28, 2006 5:31 am, dananrg@yahoo.com wrote: > Date and Pascal hate nulls. One even goes so far as to say that if you > permit NULLs in a database, then the results from *every* query is suspect. > So they turn perform backflips suggesting ways to avoid nulls. > None, so far, seem appealing. This has been discussed to death on this list and on every other SQL forum, but since you asked... To understand NULL, there is a little history that needs to be brought up. The original relational model proposal by Codd had no provisions for non-existent data. Mathematical purity is a strong argument against NULL. Another one is just as simple: "NULL represents the absence of data, so it is the antithesis of what should be stored in a _data_base." In Codd's later papers, he comes up with several distinct NULLs representing different states of unknowledge. Date is vehemently opposed to NULL for the aforementioned reasons. NULL is nothing more than a shortcut. SQL logic has to do backflips to accomodate it- notice how NULL!=NULL- indeed, one NULL can mean a variety of things even in the same context! ("Bob doesn't know","HR doesn't know","No one cares","Not applicable", etc.) In this paper: http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without-nulls.pdf Darwen discusses relational design without NULLs (his solution requires support for distributed keys which PostgreSQL admittedly doesn't support) but the premise is very simply that data can be partitioned so that the lack of knowledge is implicit in its absence (which is part of the relational model- it should be a closed system of truths). Obviously, for practical purposes, NULL isn't going anywhere fast for SQL databases, but it is really good to know the background and rationale for your own and other's design decisions. I hope this has helped. -M
On Wed, Jun 28, 2006 at 10:43:26AM -0400, A.M. wrote: > NULL is nothing more than a shortcut. SQL logic has to do backflips to > accomodate it- notice how NULL!=NULL- indeed, one NULL can mean a variety > of things even in the same context! ("Bob doesn't know","HR doesn't > know","No one cares","Not applicable", etc.) Err, NULL!=NULL => NULL, NULL=NULL => NULL. Comparing NULL with anything produces NULL, which is neither true nor false. It's straight three-valued logic. If anything the odd thing is that NULL is interpreted as false in some contexts. Yes, there are some places in SQL that need to do strange things with NULL, but this isn't one of them. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Wed, Jun 28, 2006 at 02:31:20AM -0700, dananrg@yahoo.com wrote: > Date and Pascal hate nulls. One even goes so far as to say that if > you permit NULLs in a database, then the results from *every* query > is suspect. So they turn perform backflips suggesting ways to avoid > nulls. None, so far, seem appealing. That is part of the story. Interestingly to me is that the more stridently they've rejected NULLs, the fuzzier their suggestions have been about what to do in cases of missing information. [snip] > What precisely has Date and Pascal's knickers in such a twist? The > fact that ad hoc queries from random, unintelligent users could give > strange results? What has their knickers in a twist is what twists the knickers of every other doctrinaire ideologue: that everybody doesn't "just get" why their way is far superior, drop everything they're doing, and change over to the True Wayâ¢. Their "Eat Grapefruit, You Morons" <http://www.angryflower.com/getthe.gif> tactics don't do anything to endear them either. It's good to read what Date, Darwen & Pascal have to say, but only once so you can recognize the flavor of bamboozlement when some impressionable youth (of whatever age) has a "revelation" about How Databases Should Be®. This way, you can help explain gently that their "new insight" is neither new nor insightful, and that there are some good papers <http://www.cs.toronto.edu/~libkin/publ.html> to read. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
This reminds me of my favorite Bertrand Russell quote: "The difference between theory and practice is: in theory there is no difference, but in practice, there is" -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Florian G. Pflug Sent: Wednesday, June 28, 2006 8:45 AM To: Berend Tober Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Null and Void() - Or, Abandon All Hope Ye Who allow Well, yes - as I said, using null values gives you more flexibility. But still, you _can_ shoot yourself in the foot by using them - that's why it's still good to know why some people oppose them, even if you don't share their point of view. But of course, "rm -r $(PGDATA)" is a more efficient way to shoot yourself in the foot, and will probably harm more then using null ;-) greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
On Wed, Jun 28, 2006 at 02:43:03AM -0700, dananrg@yahoo.com wrote: > 'Scuse my broken english and ungrammatical gibberish in my last > missive. That's what I get for posting before my first cup of > coffee for the day. > > I'm serious about nulls though, and am wondering if Date and Pascal, > perhaps, are the gibberish ones on this particular issue. Most people, on listening to a string of nonsense, will tend to doubt their own sanity before they realize that the person who is jabbering at them is really the one with the damaged brain. Neal Stephenson, The Big U > The insistence on no nulls, anywhere, any time, for any reason, > seems--from a practical vantage--unreasonable. You're right. It is :) Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
> Can't the intelligent practitioner simply proceed carefully with > queries when nulls are involved? Yes. The thing is, getting rid of NULL in the real world requires decomposing data into so many tables that it would certainly cause more confusion when it comes time to actually query the data... -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
On Mon, 2006-07-03 at 11:09, Scott Ribe wrote: > > Can't the intelligent practitioner simply proceed carefully with > > queries when nulls are involved? > > Yes. The thing is, getting rid of NULL in the real world requires > decomposing data into so many tables that it would certainly cause more > confusion when it comes time to actually query the data... I would add that sometimes null means we don't know, but we wish we did, and here's how we can describe our lack of knowledge... Those instances are the ones we would need lots of tables to describe, and infinite time would allow us to do so. However, there are often nulls that fall in the category of "who cares?" For those, null is a perfectly acceptable alternative, and there's no need for all the extra work.
Scott Marlowe <smarlowe@g2switchworks.com> writes: > However, there are often nulls that fall in the category of "who > cares?" For those, null is a perfectly acceptable alternative, and > there's no need for all the extra work. There is often a need for special case values. Situations like "subscription expiration date" for a subscription that shouldn't expire at all, or even "income level" for users who refuse to give that information. Also for things like the various NaN values. I kind of wish SQL allowed for an arbitrary set of "special values" regardless of data type rather than allow a single special value and have so many hard coded magical behaviours. -- greg
gsstark@mit.edu (Greg Stark) writes: > Scott Marlowe <smarlowe@g2switchworks.com> writes: > >> However, there are often nulls that fall in the category of "who >> cares?" For those, null is a perfectly acceptable alternative, and >> there's no need for all the extra work. > > There is often a need for special case values. Situations like "subscription > expiration date" for a subscription that shouldn't expire at all, or even > "income level" for users who refuse to give that information. Also for things > like the various NaN values. > > I kind of wish SQL allowed for an arbitrary set of "special values" regardless > of data type rather than allow a single special value and have so many hard > coded magical behaviours. In the case of "never expires," there is a well-defined "infinity" value... mn@[local]:5432=# create table tst (as_at timestamptz); CREATE TABLE mn@[local]:5432=# insert into tst values ('infinity'); INSERT 159195836 1 mn@[local]:5432=# insert into tst values (now()); INSERT 159195837 1 mn@[local]:5432=# select * from tst; as_at ------------------------------- infinity 2006-07-05 19:35:01.233889+00 (2 rows) As for having larger numbers of "not there" values, that tends to have somewhat unfortunate effects on system logic, as code needs to be aware of additional "special values." -- output = reverse("gro.mca" "@" "enworbbc") http://cbbrowne.com/info/emacs.html "Of course 5 years from now that will be different, but 5 years from now everyone will be running free GNU on their 200 MIPS, 64M SPARCstation-5." -- Andrew Tanenbaum, 1992.
All good points. The ability to store NULL, and the fact that there is just one kind of NULL, seem to parallel what has become common and useful in programming languages. Most support NULL at least for pointers, and many support it for all data types. It doesn't have to have a defined meaning to be useful, any more than the number "2" has to have a defined meaning to be useful. (Don't think NULL is useful for all types? Than look for all of the int functions in a C program that are returning -1 to indicate "not found," "error," etc.) In the marketplace of special values, program language designers-- at least the ones who designed the languages I use--agree that allowing NULL (and just one kind of NULL) for all data types is the winner. That alone makes NULL necessary in the database: Because storing and retrieving data that might include NULL on the program end would be torture if the database itself did not support NULL. There's plenty of nits to pick here, I think. But it's clear to me that practice has proven NULL to be too useful to ditch. Wayne Conrad