RFC: A brief guide to nulls - Mailing list pgsql-sql

From dev@archonet.com
Subject RFC: A brief guide to nulls
Date
Msg-id 1522.192.168.1.16.1042651389.squirrel@mainbox.archonet.com
Whole thread Raw
Responses Re: A brief guide to nulls  ("Otto Hirr" <otto.hirr@olabinc.com>)
Re: RFC: A brief guide to nulls  (Andrew Perrin <clists@perrin.socsci.unc.edu>)
Re: RFC: A brief guide to nulls  (Josh Berkus <josh@agliodbs.com>)
Re: RFC: A brief guide to nulls  (Antti Haapala <antti.haapala@iki.fi>)
Re: RFC: A brief guide to nulls  (Bruno Wolff III <bruno@wolff.to>)
Re: RFC: A brief guide to nulls  (Antti Haapala <antti.haapala@iki.fi>)
Re: RFC: A brief guide to nulls  (Peter Eisentraut <peter_e@gmx.net>)
Re: RFC: A brief guide to nulls  (Ludwig Lim <lud_nowhere_man@yahoo.com>)
Re: RFC: A brief guide to nulls (noarchive)  ("Ries van Twisk" <ries@jongert.nl>)
List pgsql-sql
There have been a few posts recently where people have had problems with
nulls. Anyone got comments on the below before I submit it to techdocs?

TIA

- Richard Huxton

A Brief Guide to NULLs
======================

What is a null?
===============
A null is *not* an empty string.
A null is *not* a value.
A null is *not* a "special" value.
A null is the absence of a value.


What do nulls mean?
===================
Well, they *should* mean one of two things:
1. There is no applicable value
2. There is a value but it is unknown

Example 1: Imagine you have a customer table with name and sex fields.
If you get a new customer "ACME Widgets Ltd", the sex field is meaningless
since your customer is a company (case 1).
If you get a new customer "Jackie Smith" they might be male or female, but
you might not know (case 2).

Example 2: You have an address table with (street,city,county,postalcode)
fields.
You might insert an address ("10 Downing Street","London",Null,"WC1 1AA")
since you don't have a valid county.
You might also insert an address ("1 Any Street","Maidstone","Kent",Null)
where there *must be* a valid postalcode, but you don't know what it is.

It might be useful to be able to distinguish between these two cases - not
applicable and unknown, but there is only one option "Null" available to
us, so we can't.


How do nulls work?
==================
There is one very important rule when dealing with nulls. The result of
any operation or comparison, when applied to a null is null. The only
exception is testing if a value is null.

Example: with the customer table above you could run the following queries: SELECT * FROM customer WHERE sex='M';
SELECT* FROM customer WHERE sex<>'M';
 
Now you might think this returns all customers, but it will miss those
where sex is null. You've asked for all rows where the value of sex is 'M'
and all those with values not equal to 'M' but not rows with *no value at
all*

It might help to think of a database as a set of statements you *know* to
be true. A null indicates that you *cannot say anything at all* about that
field. You can't say what it is, you can't say what it isn't, you can only
say there is some information missing.

So, to see all the customers with unknown or inapplicable sex you would need: SELECT * FROM customer WHERE sex IS
NULL;

There are actually three possible results for a test in SQL - True (the
test passed), False (the test failed) and Null (you tested against a
null). A result of null usually gets treated as False, so testing against
nulls always fails.

If you try to perform an operation on nulls, again the result is always
null. So the results of all of the following are null: SELECT 'abc' || null; SELECT 1 + null; SELECT
sqrt(null::numeric);
The first case can be especially confusing. Concatenating a null string to
a string value will return null, not the original value.


Uniqueness and nulls
====================
If you define a unique index on a column it prevents you inserting two
values that are the same. It does not prevent you inserting as many nulls
as you like. How could it, you don't have a value so it can't be the same
as any other.

Example: We create a table "ta" with a unique constraint on column "b" CREATE TABLE ta (   a int4,b varchar(3),PRIMARY
KEY(a) ); CREATE UNIQUE INDEX ta_b_idx ON ta (b); INSERT INTO ta VALUES (1,'aaa');  -- succeeds INSERT INTO ta VALUES
(2,'bbb'); -- succeeds INSERT INTO ta VALUES (3,null);   -- succeeds INSERT INTO ta VALUES (4,'bbb');  -- fails INSERT
INTOta VALUES (5,null);   -- succeeds!
 

Given the definition of what a null is, you only have two choices: allow
multiple nulls or allow no nulls. If you want no nulls, define the column
as NOT NULL when creating the table.


Keys and nulls
==============
No column that is part of a primary key can be null. When you define a
PRIMARY KEY, none of the columns mentioned can take a null value.
Postgresql makes sure of this by defining the columns as NOT NULL for you.

Example: With table "ta" we just created, \d ta will show column a as
being not null. Otherwise, we could insert rows with a set to null and
have no way to tell them apart.


Subqueries and nulls
====================
Since tests always fail when testing against nulls you can have unexpected
results with sub-queries.

Example: Assume we have a companies table and a diary table. Diary entries
are usually related to a particular company but not always. SELECT co_name FROM companies WHERE co_id NOT IN (SELECT
dy_companyFROM
 
diary);
If any row in diary contains a null dy_company then you will get *no
results*.
We can expand the query like so: WHERE co_id NOT IN (SELECT dy_company FROM diary) WHERE co_id NOT IN (1, 2, null,
3...)WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) WHERE NOT (... OR null OR ...) WHERE NOT (null) WHERE
null

You either need to explicitly check for null values, or define the column
in question as NOT NULL.


Further Information
===================
Any good relational database book (try something written by Date)
Bruce's book (link LHS at http://techdocs.postgresql.org)
My Postgresql Notes (link at http://techdocs.postgresql.org)



pgsql-sql by date:

Previous
From: Steve Crawford
Date:
Subject: Re: index on to_char(created, 'YYYY') doesn't work
Next
From: Andreas Joseph Krogh
Date:
Subject: Re: index on to_char(created, 'YYYY') doesn't work