Re: RFC: A brief guide to nulls - Mailing list pgsql-sql

From Andrew Perrin
Subject Re: RFC: A brief guide to nulls
Date
Msg-id Pine.LNX.4.21.0301151414150.7813-100000@perrin.socsci.unc.edu
Whole thread Raw
In response to RFC: A brief guide to nulls  (dev@archonet.com)
List pgsql-sql
*Disclaimer:* I do not have any formal training in database
theory. However, I have done quite a lot of work with databases in
practice, and have some understanding of "missing values" in statistics.

I would tend to think of the "no applicable value" case of a NULL as being
the result of poor or lazy planning, and therefore not really true NULL
values. (This is not to say I don't *do* it, but that it's not
theoretically appropriate to interpret a NULL as being "no applicable
value.")  To use your example 1, I would think a properly-planned and
structured database should account for the possibility of a sexless
customer by means of relationality:

customers:custidname

cust_sexes:custidsex

which would get rid of the NULL for the sex of ACME Widgets Ltd.  By
contrast, it wouldn't get rid of the NULL for Jackie Smith, who would
appropriately be represented by an entry in customers (1111, 'Jackie
Smith') and one in cust_sexes (1111, NULL).

(Otherwise the introduction is excellent.)

Any comments?

Andy

----------------------------------------------------------------------
Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu


On Wed, 15 Jan 2003 dev@archonet.com wrote:

> There have been a few posts recently where people have had problems with
> nulls. Anyone got comments on the below before I submit it to techdocs?
> 
> TIA
> 
> - Richard Huxton
> 
> A Brief Guide to NULLs
> ======================
> 
> What is a null?
> ===============
> A null is *not* an empty string.
> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.
> 
> 
> What do nulls mean?
> ===================
> Well, they *should* mean one of two things:
> 1. There is no applicable value
> 2. There is a value but it is unknown
> 
> Example 1: Imagine you have a customer table with name and sex fields.
> If you get a new customer "ACME Widgets Ltd", the sex field is meaningless
> since your customer is a company (case 1).
> If you get a new customer "Jackie Smith" they might be male or female, but
> you might not know (case 2).
> 
> Example 2: You have an address table with (street,city,county,postalcode)
> fields.
> You might insert an address ("10 Downing Street","London",Null,"WC1 1AA")
> since you don't have a valid county.
> You might also insert an address ("1 Any Street","Maidstone","Kent",Null)
> where there *must be* a valid postalcode, but you don't know what it is.
> 
> It might be useful to be able to distinguish between these two cases - not
> applicable and unknown, but there is only one option "Null" available to
> us, so we can't.
> 
> 
> How do nulls work?
> ==================
> There is one very important rule when dealing with nulls. The result of
> any operation or comparison, when applied to a null is null. The only
> exception is testing if a value is null.
> 
> Example: with the customer table above you could run the following queries:
>   SELECT * FROM customer WHERE sex='M';
>   SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value of sex is 'M'
> and all those with values not equal to 'M' but not rows with *no value at
> all*
> 
> It might help to think of a database as a set of statements you *know* to
> be true. A null indicates that you *cannot say anything at all* about that
> field. You can't say what it is, you can't say what it isn't, you can only
> say there is some information missing.
> 
> So, to see all the customers with unknown or inapplicable sex you would need:
>   SELECT * FROM customer WHERE sex IS NULL;
> 
> There are actually three possible results for a test in SQL - True (the
> test passed), False (the test failed) and Null (you tested against a
> null). A result of null usually gets treated as False, so testing against
> nulls always fails.
> 
> If you try to perform an operation on nulls, again the result is always
> null. So the results of all of the following are null:
>   SELECT 'abc' || null;
>   SELECT 1 + null;
>   SELECT sqrt(null::numeric);
> The first case can be especially confusing. Concatenating a null string to
> a string value will return null, not the original value.
> 
> 
> Uniqueness and nulls
> ====================
> If you define a unique index on a column it prevents you inserting two
> values that are the same. It does not prevent you inserting as many nulls
> as you like. How could it, you don't have a value so it can't be the same
> as any other.
> 
> Example: We create a table "ta" with a unique constraint on column "b"
>   CREATE TABLE ta (
>     a int4,
>     b varchar(3),
>     PRIMARY KEY (a)
>   );
>   CREATE UNIQUE INDEX ta_b_idx ON ta (b);
>   INSERT INTO ta VALUES (1,'aaa');  -- succeeds
>   INSERT INTO ta VALUES (2,'bbb');  -- succeeds
>   INSERT INTO ta VALUES (3,null);   -- succeeds
>   INSERT INTO ta VALUES (4,'bbb');  -- fails
>   INSERT INTO ta VALUES (5,null);   -- succeeds!
> 
> Given the definition of what a null is, you only have two choices: allow
> multiple nulls or allow no nulls. If you want no nulls, define the column
> as NOT NULL when creating the table.
> 
> 
> Keys and nulls
> ==============
> No column that is part of a primary key can be null. When you define a
> PRIMARY KEY, none of the columns mentioned can take a null value.
> Postgresql makes sure of this by defining the columns as NOT NULL for you.
> 
> Example: With table "ta" we just created, \d ta will show column a as
> being not null. Otherwise, we could insert rows with a set to null and
> have no way to tell them apart.
> 
> 
> Subqueries and nulls
> ====================
> Since tests always fail when testing against nulls you can have unexpected
> results with sub-queries.
> 
> Example: Assume we have a companies table and a diary table. Diary entries
> are usually related to a particular company but not always.
>   SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM
> diary);
> If any row in diary contains a null dy_company then you will get *no
> results*.
> We can expand the query like so:
>   WHERE co_id NOT IN (SELECT dy_company FROM diary)
>   WHERE co_id NOT IN (1, 2, null, 3...)
>   WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
>   WHERE NOT (... OR null OR ...)
>   WHERE NOT (null)
>   WHERE null
> 
> You either need to explicitly check for null values, or define the column
> in question as NOT NULL.
> 
> 
> Further Information
> ===================
> Any good relational database book (try something written by Date)
> Bruce's book (link LHS at http://techdocs.postgresql.org)
> My Postgresql Notes (link at http://techdocs.postgresql.org)
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 



pgsql-sql by date:

Previous
From: "Otto Hirr"
Date:
Subject: Re: A brief guide to nulls
Next
From: Josh Berkus
Date:
Subject: Re: query speed joining tables