Re: RFC: A brief guide to nulls - Mailing list pgsql-sql

From Josh Berkus
Subject Re: RFC: A brief guide to nulls
Date
Msg-id 200301141146.50308.josh@agliodbs.com
Whole thread Raw
In response to RFC: A brief guide to nulls  (dev@archonet.com)
List pgsql-sql
Richard,

Thanks for writing this!  Would it be cool if we included it in the Linux
Weekly News?   I'll ask Robert.

Two edits:

On Wednesday 15 January 2003 09:23, dev@archonet.com wrote:
> There have been a few posts recently where people have had problems with
> nulls. Anyone got comments on the below before I submit it to techdocs?
>
> TIA
>
> - Richard Huxton
>
> A Brief Guide to NULLs
> ======================
>
> What is a null?
> ===============
> A null is *not* an empty string.
> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.
>
>
> What do nulls mean?
> ===================
> Well, they *should* mean one of two things:
> 1. There is no applicable value
> 2. There is a value but it is unknown
>
> Example 1: Imagine you have a customer table with name and sex fields.
> If you get a new customer "ACME Widgets Ltd", the sex field is meaningless
> since your customer is a company (case 1).
> If you get a new customer "Jackie Smith" they might be male or female, but
> you might not know (case 2).
>
> Example 2: You have an address table with (street,city,county,postalcode)
> fields.
> You might insert an address ("10 Downing Street","London",Null,"WC1 1AA")
> since you don't have a valid county.
> You might also insert an address ("1 Any Street","Maidstone","Kent",Null)
> where there *must be* a valid postalcode, but you don't know what it is.
>
> It might be useful to be able to distinguish between these two cases - not
> applicable and unknown, but there is only one option "Null" available to
> us, so we can't.
>
>
> How do nulls work?
> ==================
> There is one very important rule when dealing with nulls. The result of
> any operation or comparison, when applied to a null is null. The only
> exception is testing if a value is null.
>
> Example: with the customer table above you could run the following queries:
>   SELECT * FROM customer WHERE sex='M';
>   SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value of sex is 'M'
> and all those with values not equal to 'M' but not rows with *no value at
> all*
>
> It might help to think of a database as a set of statements you *know* to
> be true. A null indicates that you *cannot say anything at all* about that
> field. You can't say what it is, you can't say what it isn't, you can only
> say there is some information missing.
>
> So, to see all the customers with unknown or inapplicable sex you would
need:
>   SELECT * FROM customer WHERE sex IS NULL;

** MOVE the above paragraph ....

>
> There are actually three possible results for a test in SQL - True (the
> test passed), False (the test failed) and Null (you tested against a
> null). A result of null usually gets treated as False, so testing against
> nulls always fails.
>
> If you try to perform an operation on nulls, again the result is always
> null. So the results of all of the following are null:
>   SELECT 'abc' || null;
>   SELECT 1 + null;
>   SELECT sqrt(null::numeric);
> The first case can be especially confusing. Concatenating a null string to
> a string value will return null, not the original value.

** ... down to here, and define the IS NULL and IS NOT NULL operators for the
reader.

>
> Uniqueness and nulls
> ====================
> If you define a unique index on a column it prevents you inserting two
> values that are the same. It does not prevent you inserting as many nulls
> as you like. How could it, you don't have a value so it can't be the same
> as any other.
>
> Example: We create a table "ta" with a unique constraint on column "b"
>   CREATE TABLE ta (
>     a int4,
>     b varchar(3),
>     PRIMARY KEY (a)
>   );
>   CREATE UNIQUE INDEX ta_b_idx ON ta (b);
>   INSERT INTO ta VALUES (1,'aaa');  -- succeeds
>   INSERT INTO ta VALUES (2,'bbb');  -- succeeds
>   INSERT INTO ta VALUES (3,null);   -- succeeds
>   INSERT INTO ta VALUES (4,'bbb');  -- fails
>   INSERT INTO ta VALUES (5,null);   -- succeeds!
>
> Given the definition of what a null is, you only have two choices: allow
> multiple nulls or allow no nulls. If you want no nulls, define the column
> as NOT NULL when creating the table.
>
>
> Keys and nulls
> ==============
> No column that is part of a primary key can be null. When you define a
> PRIMARY KEY, none of the columns mentioned can take a null value.
> Postgresql makes sure of this by defining the columns as NOT NULL for you.
>
> Example: With table "ta" we just created, \d ta will show column a as
> being not null. Otherwise, we could insert rows with a set to null and
> have no way to tell them apart.
>
>
> Subqueries and nulls
> ====================
> Since tests always fail when testing against nulls you can have unexpected
> results with sub-queries.
>
> Example: Assume we have a companies table and a diary table. Diary entries
> are usually related to a particular company but not always.
>   SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM
> diary);
> If any row in diary contains a null dy_company then you will get *no
> results*.
> We can expand the query like so:
>   WHERE co_id NOT IN (SELECT dy_company FROM diary)
>   WHERE co_id NOT IN (1, 2, null, 3...)
>   WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
>   WHERE NOT (... OR null OR ...)
>   WHERE NOT (null)
>   WHERE null

I'd suggest that you do not want to give readers an example that contains NOT
IN (subselect).   There are almost no cricumstances under which a user would
choose NOT IN (subselect) over WHERE NOT EXISTS (subselect), and giving it in
an example will just confuse newbies.

> You either need to explicitly check for null values, or define the column
> in question as NOT NULL.
>
>
> Further Information
> ===================
> Any good relational database book (try something written by Date)
> Bruce's book (link LHS at http://techdocs.postgresql.org)
> My Postgresql Notes (link at http://techdocs.postgresql.org)

Add links to the two PostgreSQL book pages ... the one on Techdocs and the one
on www.postgresql.org/users-lounge

FWIW.

--
-Josh BerkusAglio Database SolutionsSan Francisco



pgsql-sql by date:

Previous
From: Josh Berkus
Date:
Subject: Re: query speed joining tables
Next
From: Tom Lane
Date:
Subject: Re: sub-select with aggregate