Thread: RFC: A brief guide to nulls

RFC: A brief guide to nulls

From

dev@archonet.com

Date:

15 January 2003, 13:15:32

There have been a few posts recently where people have had problems with
nulls. Anyone got comments on the below before I submit it to techdocs?

TIA

- Richard Huxton

A Brief Guide to NULLs
======================

What is a null?
===============
A null is *not* an empty string.
A null is *not* a value.
A null is *not* a "special" value.
A null is the absence of a value.


What do nulls mean?
===================
Well, they *should* mean one of two things:
1. There is no applicable value
2. There is a value but it is unknown

Example 1: Imagine you have a customer table with name and sex fields.
If you get a new customer "ACME Widgets Ltd", the sex field is meaningless
since your customer is a company (case 1).
If you get a new customer "Jackie Smith" they might be male or female, but
you might not know (case 2).

Example 2: You have an address table with (street,city,county,postalcode)
fields.
You might insert an address ("10 Downing Street","London",Null,"WC1 1AA")
since you don't have a valid county.
You might also insert an address ("1 Any Street","Maidstone","Kent",Null)
where there *must be* a valid postalcode, but you don't know what it is.

It might be useful to be able to distinguish between these two cases - not
applicable and unknown, but there is only one option "Null" available to
us, so we can't.


How do nulls work?
==================
There is one very important rule when dealing with nulls. The result of
any operation or comparison, when applied to a null is null. The only
exception is testing if a value is null.

Example: with the customer table above you could run the following queries: SELECT * FROM customer WHERE sex='M';
SELECT* FROM customer WHERE sex<>'M';
 
Now you might think this returns all customers, but it will miss those
where sex is null. You've asked for all rows where the value of sex is 'M'
and all those with values not equal to 'M' but not rows with *no value at
all*

It might help to think of a database as a set of statements you *know* to
be true. A null indicates that you *cannot say anything at all* about that
field. You can't say what it is, you can't say what it isn't, you can only
say there is some information missing.

So, to see all the customers with unknown or inapplicable sex you would need: SELECT * FROM customer WHERE sex IS
NULL;

There are actually three possible results for a test in SQL - True (the
test passed), False (the test failed) and Null (you tested against a
null). A result of null usually gets treated as False, so testing against
nulls always fails.

If you try to perform an operation on nulls, again the result is always
null. So the results of all of the following are null: SELECT 'abc' || null; SELECT 1 + null; SELECT
sqrt(null::numeric);
The first case can be especially confusing. Concatenating a null string to
a string value will return null, not the original value.


Uniqueness and nulls
====================
If you define a unique index on a column it prevents you inserting two
values that are the same. It does not prevent you inserting as many nulls
as you like. How could it, you don't have a value so it can't be the same
as any other.

Example: We create a table "ta" with a unique constraint on column "b" CREATE TABLE ta (   a int4,b varchar(3),PRIMARY
KEY(a) ); CREATE UNIQUE INDEX ta_b_idx ON ta (b); INSERT INTO ta VALUES (1,'aaa');  -- succeeds INSERT INTO ta VALUES
(2,'bbb'); -- succeeds INSERT INTO ta VALUES (3,null);   -- succeeds INSERT INTO ta VALUES (4,'bbb');  -- fails INSERT
INTOta VALUES (5,null);   -- succeeds!
 

Given the definition of what a null is, you only have two choices: allow
multiple nulls or allow no nulls. If you want no nulls, define the column
as NOT NULL when creating the table.


Keys and nulls
==============
No column that is part of a primary key can be null. When you define a
PRIMARY KEY, none of the columns mentioned can take a null value.
Postgresql makes sure of this by defining the columns as NOT NULL for you.

Example: With table "ta" we just created, \d ta will show column a as
being not null. Otherwise, we could insert rows with a set to null and
have no way to tell them apart.


Subqueries and nulls
====================
Since tests always fail when testing against nulls you can have unexpected
results with sub-queries.

Example: Assume we have a companies table and a diary table. Diary entries
are usually related to a particular company but not always. SELECT co_name FROM companies WHERE co_id NOT IN (SELECT
dy_companyFROM
 
diary);
If any row in diary contains a null dy_company then you will get *no
results*.
We can expand the query like so: WHERE co_id NOT IN (SELECT dy_company FROM diary) WHERE co_id NOT IN (1, 2, null,
3...)WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) WHERE NOT (... OR null OR ...) WHERE NOT (null) WHERE
null

You either need to explicitly check for null values, or define the column
in question as NOT NULL.


Further Information
===================
Any good relational database book (try something written by Date)
Bruce's book (link LHS at http://techdocs.postgresql.org)
My Postgresql Notes (link at http://techdocs.postgresql.org)

Re: A brief guide to nulls

From

"Otto Hirr"

Date:

15 January 2003, 14:32:48

Another viewpoint, is to go a bit further into implementation...
i.e. for every field, there are two items...
*one item being the null/notnull info
*other item being the value info of the field per the type at def time
This faciliatates the concepts of
where <fieldname> is null ... testing the null/notnull info
where <fieldname> is not null ... testing the null/notnull info
where <fieldname> = "somevalue" ... testing the value info
etc.

Trying to "hide" the concept that every field has TWO items,
the value and whether it is null/notnull, seems to confuse people.
By SIMPLY stating that every SQL field has two items, then causes
a scripter to think a bit more deeply about a where clause...

Hence making your example ...

>   SELECT * FROM customer WHERE sex='M';
>   SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value of sex is 'M'
> and all those with values not equal to 'M' but not rows with *no value at
all*

all that less mysterious...
i.e. the above two selects only tested one of the two items for the SQL
field,
the other item to test is the null/notnull item.


.. Otto


> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of dev@archonet.com
> Sent: Wednesday, January 15, 2003 9:23 AM
> To: pgsql-sql@postgresql.org
> Subject: RFC: A brief guide to nulls
>
>
> There have been a few posts recently where people have had
> problems with
> nulls. Anyone got comments on the below before I submit it to
> techdocs?
>
> TIA
>
> - Richard Huxton
>
> A Brief Guide to NULLs
> ======================
>
> What is a null?
> ===============
> A null is *not* an empty string.
> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.
>
>
> What do nulls mean?
> ===================
> Well, they *should* mean one of two things:
> 1. There is no applicable value
> 2. There is a value but it is unknown
>
> Example 1: Imagine you have a customer table with name and sex fields.
> If you get a new customer "ACME Widgets Ltd", the sex field
> is meaningless
> since your customer is a company (case 1).
> If you get a new customer "Jackie Smith" they might be male
> or female, but
> you might not know (case 2).
>
> Example 2: You have an address table with
> (street,city,county,postalcode)
> fields.
> You might insert an address ("10 Downing
> Street","London",Null,"WC1 1AA")
> since you don't have a valid county.
> You might also insert an address ("1 Any
> Street","Maidstone","Kent",Null)
> where there *must be* a valid postalcode, but you don't know
> what it is.
>
> It might be useful to be able to distinguish between these
> two cases - not
> applicable and unknown, but there is only one option "Null"
> available to
> us, so we can't.
>
>
> How do nulls work?
> ==================
> There is one very important rule when dealing with nulls. The
> result of
> any operation or comparison, when applied to a null is null. The only
> exception is testing if a value is null.
>
> Example: with the customer table above you could run the
> following queries:
>   SELECT * FROM customer WHERE sex='M';
>   SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value
> of sex is 'M'
> and all those with values not equal to 'M' but not rows with
> *no value at
> all*
>
> It might help to think of a database as a set of statements
> you *know* to
> be true. A null indicates that you *cannot say anything at
> all* about that
> field. You can't say what it is, you can't say what it isn't,
> you can only
> say there is some information missing.
>
> So, to see all the customers with unknown or inapplicable sex
> you would need:
>   SELECT * FROM customer WHERE sex IS NULL;
>
> There are actually three possible results for a test in SQL -
> True (the
> test passed), False (the test failed) and Null (you tested against a
> null). A result of null usually gets treated as False, so
> testing against
> nulls always fails.
>
> If you try to perform an operation on nulls, again the result
> is always
> null. So the results of all of the following are null:
>   SELECT 'abc' || null;
>   SELECT 1 + null;
>   SELECT sqrt(null::numeric);
> The first case can be especially confusing. Concatenating a
> null string to
> a string value will return null, not the original value.
>
>
> Uniqueness and nulls
> ====================
> If you define a unique index on a column it prevents you inserting two
> values that are the same. It does not prevent you inserting
> as many nulls
> as you like. How could it, you don't have a value so it can't
> be the same
> as any other.
>
> Example: We create a table "ta" with a unique constraint on column "b"
>   CREATE TABLE ta (
>     a int4,
>     b varchar(3),
>     PRIMARY KEY (a)
>   );
>   CREATE UNIQUE INDEX ta_b_idx ON ta (b);
>   INSERT INTO ta VALUES (1,'aaa');  -- succeeds
>   INSERT INTO ta VALUES (2,'bbb');  -- succeeds
>   INSERT INTO ta VALUES (3,null);   -- succeeds
>   INSERT INTO ta VALUES (4,'bbb');  -- fails
>   INSERT INTO ta VALUES (5,null);   -- succeeds!
>
> Given the definition of what a null is, you only have two
> choices: allow
> multiple nulls or allow no nulls. If you want no nulls,
> define the column
> as NOT NULL when creating the table.
>
>
> Keys and nulls
> ==============
> No column that is part of a primary key can be null. When you define a
> PRIMARY KEY, none of the columns mentioned can take a null value.
> Postgresql makes sure of this by defining the columns as NOT
> NULL for you.
>
> Example: With table "ta" we just created, \d ta will show column a as
> being not null. Otherwise, we could insert rows with a set to null and
> have no way to tell them apart.
>
>
> Subqueries and nulls
> ====================
> Since tests always fail when testing against nulls you can
> have unexpected
> results with sub-queries.
>
> Example: Assume we have a companies table and a diary table.
> Diary entries
> are usually related to a particular company but not always.
>   SELECT co_name FROM companies WHERE co_id NOT IN (SELECT
> dy_company FROM
> diary);
> If any row in diary contains a null dy_company then you will get *no
> results*.
> We can expand the query like so:
>   WHERE co_id NOT IN (SELECT dy_company FROM diary)
>   WHERE co_id NOT IN (1, 2, null, 3...)
>   WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
>   WHERE NOT (... OR null OR ...)
>   WHERE NOT (null)
>   WHERE null
>
> You either need to explicitly check for null values, or
> define the column
> in question as NOT NULL.
>
>
> Further Information
> ===================
> Any good relational database book (try something written by Date)
> Bruce's book (link LHS at http://techdocs.postgresql.org)
> My Postgresql Notes (link at http://techdocs.postgresql.org)
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>

Re: RFC: A brief guide to nulls

From

Andrew Perrin

Date:

15 January 2003, 15:08:19

*Disclaimer:* I do not have any formal training in database
theory. However, I have done quite a lot of work with databases in
practice, and have some understanding of "missing values" in statistics.

I would tend to think of the "no applicable value" case of a NULL as being
the result of poor or lazy planning, and therefore not really true NULL
values. (This is not to say I don't *do* it, but that it's not
theoretically appropriate to interpret a NULL as being "no applicable
value.")  To use your example 1, I would think a properly-planned and
structured database should account for the possibility of a sexless
customer by means of relationality:

customers:custidname

cust_sexes:custidsex

which would get rid of the NULL for the sex of ACME Widgets Ltd.  By
contrast, it wouldn't get rid of the NULL for Jackie Smith, who would
appropriately be represented by an entry in customers (1111, 'Jackie
Smith') and one in cust_sexes (1111, NULL).

(Otherwise the introduction is excellent.)

Any comments?

Andy

----------------------------------------------------------------------
Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu


On Wed, 15 Jan 2003 dev@archonet.com wrote:

> There have been a few posts recently where people have had problems with
> nulls. Anyone got comments on the below before I submit it to techdocs?
> 
> TIA
> 
> - Richard Huxton
> 
> A Brief Guide to NULLs
> ======================
> 
> What is a null?
> ===============
> A null is *not* an empty string.
> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.
> 
> 
> What do nulls mean?
> ===================
> Well, they *should* mean one of two things:
> 1. There is no applicable value
> 2. There is a value but it is unknown
> 
> Example 1: Imagine you have a customer table with name and sex fields.
> If you get a new customer "ACME Widgets Ltd", the sex field is meaningless
> since your customer is a company (case 1).
> If you get a new customer "Jackie Smith" they might be male or female, but
> you might not know (case 2).
> 
> Example 2: You have an address table with (street,city,county,postalcode)
> fields.
> You might insert an address ("10 Downing Street","London",Null,"WC1 1AA")
> since you don't have a valid county.
> You might also insert an address ("1 Any Street","Maidstone","Kent",Null)
> where there *must be* a valid postalcode, but you don't know what it is.
> 
> It might be useful to be able to distinguish between these two cases - not
> applicable and unknown, but there is only one option "Null" available to
> us, so we can't.
> 
> 
> How do nulls work?
> ==================
> There is one very important rule when dealing with nulls. The result of
> any operation or comparison, when applied to a null is null. The only
> exception is testing if a value is null.
> 
> Example: with the customer table above you could run the following queries:
>   SELECT * FROM customer WHERE sex='M';
>   SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value of sex is 'M'
> and all those with values not equal to 'M' but not rows with *no value at
> all*
> 
> It might help to think of a database as a set of statements you *know* to
> be true. A null indicates that you *cannot say anything at all* about that
> field. You can't say what it is, you can't say what it isn't, you can only
> say there is some information missing.
> 
> So, to see all the customers with unknown or inapplicable sex you would need:
>   SELECT * FROM customer WHERE sex IS NULL;
> 
> There are actually three possible results for a test in SQL - True (the
> test passed), False (the test failed) and Null (you tested against a
> null). A result of null usually gets treated as False, so testing against
> nulls always fails.
> 
> If you try to perform an operation on nulls, again the result is always
> null. So the results of all of the following are null:
>   SELECT 'abc' || null;
>   SELECT 1 + null;
>   SELECT sqrt(null::numeric);
> The first case can be especially confusing. Concatenating a null string to
> a string value will return null, not the original value.
> 
> 
> Uniqueness and nulls
> ====================
> If you define a unique index on a column it prevents you inserting two
> values that are the same. It does not prevent you inserting as many nulls
> as you like. How could it, you don't have a value so it can't be the same
> as any other.
> 
> Example: We create a table "ta" with a unique constraint on column "b"
>   CREATE TABLE ta (
>     a int4,
>     b varchar(3),
>     PRIMARY KEY (a)
>   );
>   CREATE UNIQUE INDEX ta_b_idx ON ta (b);
>   INSERT INTO ta VALUES (1,'aaa');  -- succeeds
>   INSERT INTO ta VALUES (2,'bbb');  -- succeeds
>   INSERT INTO ta VALUES (3,null);   -- succeeds
>   INSERT INTO ta VALUES (4,'bbb');  -- fails
>   INSERT INTO ta VALUES (5,null);   -- succeeds!
> 
> Given the definition of what a null is, you only have two choices: allow
> multiple nulls or allow no nulls. If you want no nulls, define the column
> as NOT NULL when creating the table.
> 
> 
> Keys and nulls
> ==============
> No column that is part of a primary key can be null. When you define a
> PRIMARY KEY, none of the columns mentioned can take a null value.
> Postgresql makes sure of this by defining the columns as NOT NULL for you.
> 
> Example: With table "ta" we just created, \d ta will show column a as
> being not null. Otherwise, we could insert rows with a set to null and
> have no way to tell them apart.
> 
> 
> Subqueries and nulls
> ====================
> Since tests always fail when testing against nulls you can have unexpected
> results with sub-queries.
> 
> Example: Assume we have a companies table and a diary table. Diary entries
> are usually related to a particular company but not always.
>   SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM
> diary);
> If any row in diary contains a null dy_company then you will get *no
> results*.
> We can expand the query like so:
>   WHERE co_id NOT IN (SELECT dy_company FROM diary)
>   WHERE co_id NOT IN (1, 2, null, 3...)
>   WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
>   WHERE NOT (... OR null OR ...)
>   WHERE NOT (null)
>   WHERE null
> 
> You either need to explicitly check for null values, or define the column
> in question as NOT NULL.
> 
> 
> Further Information
> ===================
> Any good relational database book (try something written by Date)
> Bruce's book (link LHS at http://techdocs.postgresql.org)
> My Postgresql Notes (link at http://techdocs.postgresql.org)
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
>

Re: RFC: A brief guide to nulls

From

Josh Berkus

Date:

15 January 2003, 15:37:28

Richard,

Thanks for writing this!  Would it be cool if we included it in the Linux
Weekly News?   I'll ask Robert.

Two edits:

On Wednesday 15 January 2003 09:23, dev@archonet.com wrote:
> There have been a few posts recently where people have had problems with
> nulls. Anyone got comments on the below before I submit it to techdocs?
>
> TIA
>
> - Richard Huxton
>
> A Brief Guide to NULLs
> ======================
>
> What is a null?
> ===============
> A null is *not* an empty string.
> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.
>
>
> What do nulls mean?
> ===================
> Well, they *should* mean one of two things:
> 1. There is no applicable value
> 2. There is a value but it is unknown
>
> Example 1: Imagine you have a customer table with name and sex fields.
> If you get a new customer "ACME Widgets Ltd", the sex field is meaningless
> since your customer is a company (case 1).
> If you get a new customer "Jackie Smith" they might be male or female, but
> you might not know (case 2).
>
> Example 2: You have an address table with (street,city,county,postalcode)
> fields.
> You might insert an address ("10 Downing Street","London",Null,"WC1 1AA")
> since you don't have a valid county.
> You might also insert an address ("1 Any Street","Maidstone","Kent",Null)
> where there *must be* a valid postalcode, but you don't know what it is.
>
> It might be useful to be able to distinguish between these two cases - not
> applicable and unknown, but there is only one option "Null" available to
> us, so we can't.
>
>
> How do nulls work?
> ==================
> There is one very important rule when dealing with nulls. The result of
> any operation or comparison, when applied to a null is null. The only
> exception is testing if a value is null.
>
> Example: with the customer table above you could run the following queries:
>   SELECT * FROM customer WHERE sex='M';
>   SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value of sex is 'M'
> and all those with values not equal to 'M' but not rows with *no value at
> all*
>
> It might help to think of a database as a set of statements you *know* to
> be true. A null indicates that you *cannot say anything at all* about that
> field. You can't say what it is, you can't say what it isn't, you can only
> say there is some information missing.
>
> So, to see all the customers with unknown or inapplicable sex you would
need:
>   SELECT * FROM customer WHERE sex IS NULL;

** MOVE the above paragraph ....

>
> There are actually three possible results for a test in SQL - True (the
> test passed), False (the test failed) and Null (you tested against a
> null). A result of null usually gets treated as False, so testing against
> nulls always fails.
>
> If you try to perform an operation on nulls, again the result is always
> null. So the results of all of the following are null:
>   SELECT 'abc' || null;
>   SELECT 1 + null;
>   SELECT sqrt(null::numeric);
> The first case can be especially confusing. Concatenating a null string to
> a string value will return null, not the original value.

** ... down to here, and define the IS NULL and IS NOT NULL operators for the
reader.

>
> Uniqueness and nulls
> ====================
> If you define a unique index on a column it prevents you inserting two
> values that are the same. It does not prevent you inserting as many nulls
> as you like. How could it, you don't have a value so it can't be the same
> as any other.
>
> Example: We create a table "ta" with a unique constraint on column "b"
>   CREATE TABLE ta (
>     a int4,
>     b varchar(3),
>     PRIMARY KEY (a)
>   );
>   CREATE UNIQUE INDEX ta_b_idx ON ta (b);
>   INSERT INTO ta VALUES (1,'aaa');  -- succeeds
>   INSERT INTO ta VALUES (2,'bbb');  -- succeeds
>   INSERT INTO ta VALUES (3,null);   -- succeeds
>   INSERT INTO ta VALUES (4,'bbb');  -- fails
>   INSERT INTO ta VALUES (5,null);   -- succeeds!
>
> Given the definition of what a null is, you only have two choices: allow
> multiple nulls or allow no nulls. If you want no nulls, define the column
> as NOT NULL when creating the table.
>
>
> Keys and nulls
> ==============
> No column that is part of a primary key can be null. When you define a
> PRIMARY KEY, none of the columns mentioned can take a null value.
> Postgresql makes sure of this by defining the columns as NOT NULL for you.
>
> Example: With table "ta" we just created, \d ta will show column a as
> being not null. Otherwise, we could insert rows with a set to null and
> have no way to tell them apart.
>
>
> Subqueries and nulls
> ====================
> Since tests always fail when testing against nulls you can have unexpected
> results with sub-queries.
>
> Example: Assume we have a companies table and a diary table. Diary entries
> are usually related to a particular company but not always.
>   SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM
> diary);
> If any row in diary contains a null dy_company then you will get *no
> results*.
> We can expand the query like so:
>   WHERE co_id NOT IN (SELECT dy_company FROM diary)
>   WHERE co_id NOT IN (1, 2, null, 3...)
>   WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
>   WHERE NOT (... OR null OR ...)
>   WHERE NOT (null)
>   WHERE null

I'd suggest that you do not want to give readers an example that contains NOT
IN (subselect).   There are almost no cricumstances under which a user would
choose NOT IN (subselect) over WHERE NOT EXISTS (subselect), and giving it in
an example will just confuse newbies.

> You either need to explicitly check for null values, or define the column
> in question as NOT NULL.
>
>
> Further Information
> ===================
> Any good relational database book (try something written by Date)
> Bruce's book (link LHS at http://techdocs.postgresql.org)
> My Postgresql Notes (link at http://techdocs.postgresql.org)

Add links to the two PostgreSQL book pages ... the one on Techdocs and the one
on www.postgresql.org/users-lounge

FWIW.

--
-Josh BerkusAglio Database SolutionsSan Francisco

Re: RFC: A brief guide to nulls

From

Antti Haapala

Date:

15 January 2003, 15:48:21

This should be added (From PostgreSQL integrated docs...):

SQL uses a three-valued Boolean logic where the null value represents
"unknown". Observe the following truth tables:

a     | b     | a AND b | a OR b
------+-------+---------+--------
TRUE  | TRUE  | TRUE    | TRUE
TRUE  | FALSE | FALSE   | TRUE
TRUE  | NULL  | NULL    | TRUE
FALSE | FALSE | FALSE   | FALSE
FALSE | NULL  | FALSE   | NULL
NULL  | NULL  | NULL    | NULL



So, if there's any trues in the chain of ORs, the whole expression will be
true, not null. This conforms to NULL representing unknown value. If you
have "true or unknown", of course whole result is true regardless of the
"unknown". Let's check this example:

> Subqueries and nulls
> ====================
> Since tests always fail when testing against nulls you can have unexpected
> results with sub-queries.
>
> Example: Assume we have a companies table and a diary table. Diary entries
> are usually related to a particular company but not always.
>   SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM
> diary);
> If any row in diary contains a null dy_company then you will get *no
> results*.
> We can expand the query like so:
>   WHERE co_id NOT IN (SELECT dy_company FROM diary)
>   WHERE co_id NOT IN (1, 2, null, 3...)
>   WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
>   WHERE NOT (... OR null OR ...)

>   WHERE NOT (null)
(erm... actually not)

template1=# select 5 in (1, 2, 9, null);?column?
----------

(1 row)

template1=# select 2 in (1, 2, 9, null);?column?
----------t
(1 row)

WHERE NOT (null/true) -> evaluates to
WHERE null/false

So the result was the same - but only for this example.  Suppose you take
the NOT away:

template1=# select * from a where a.i in (1, 2, 9, null);i
---12
(2 rows)

I surely see two lines (not 0)... :)

And this could be added too, for clarification: "SELECT clause lists lines
for which the WHERE expression is certainly known to be true." ;)

-- 
Antti Haapala

Re: RFC: A brief guide to nulls

From

Bruno Wolff III

Date:

15 January 2003, 16:22:09

On Wed, Jan 15, 2003 at 17:23:09 -0000, dev@archonet.com wrote:
> 
> How do nulls work?
> ==================
> There is one very important rule when dealing with nulls. The result of
> any operation or comparison, when applied to a null is null. The only
> exception is testing if a value is null.

I suggest rephrasing this to make it clear you need to use is null to
do this and that = null won't do what you expect.

> If you try to perform an operation on nulls, again the result is always
> null. So the results of all of the following are null:

The above isn't true (i.e. null op something isn't ALWAYS null). For example:

bruno=> select 't'::boolean or null;?column?
----------t
(1 row)

Re: RFC: A brief guide to nulls

From

Antti Haapala

Date:

15 January 2003, 18:52:51

A few more comments...

> It might be useful to be able to distinguish between these two cases - not
> applicable and unknown, but there is only one option "Null" available to
> us, so we can't.

If we really need to distinguish between these two cases, I think null
shouldn't be used as a N/A value but some other like empty string or 0.
(IMHO it's preferable not to use null as N/A at all).

For example sex could be classified as'n'  - not applicable'f'  - female'm'  - malenull - yet unknown

> Example: with the customer table above you could run the following queries:
>   SELECT * FROM customer WHERE sex='M';
>   SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value of sex is 'M'
> and all those with values not equal to 'M' but not rows with *no value at
> all*

these could be explained asselect all customers who surely are menselect all customers who surely aren't men

if customers sex is unknown - null, we can't decide whether they're men or
not.

> The first case can be especially confusing. Concatenating a null string to
> a string value will return null, not the original value.

Isn't it null, not null string? ;)

> Keys and nulls
> ==============
> No column that is part of a primary key can be null. When you define a
> PRIMARY KEY, none of the columns mentioned can take a null value.
> Postgresql makes sure of this by defining the columns as NOT NULL for you.

... because primary keys are to uniquelly identify rows in a table, and
how's an unknown values going to do that :)

-- 
Antti Haapala
+358 50 369 3535
ICQ: #177673735

Re: RFC: A brief guide to nulls

From

Peter Eisentraut

Date:

15 January 2003, 21:02:24

dev@archonet.com writes:

> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.

A quotation directly from the SQL standard:
   Every data type includes a special value, called the null value,

This seems to directly contradict those three statements.

-- 
Peter Eisentraut   peter_e@gmx.net

Re: RFC: A brief guide to nulls

From

Ludwig Lim

Date:

15 January 2003, 21:22:30

--- dev@archonet.com wrote:
> There have been a few posts recently where people
> have had problems with
> nulls. Anyone got comments on the below before I
> submit it to techdocs?
> 
> TIA
> 
> - Richard Huxton
> 
> A Brief Guide to NULLs
> ======================
> 
> What is a null?
> ===============
> A null is *not* an empty string.
> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.
> 
> 
> What do nulls mean?
> ===================
> Well, they *should* mean one of two things:
> 1. There is no applicable value
> 2. There is a value but it is unknown
 Good job!, it sure helps people who don't much
background on formal database theory.
 What about adding this to the section "What does
nulls mean"--> 3) No value has yet been assigned to that
particular attribute (field).
  I think it would also be nice if you can add a
section on functions that deals with NULL such as
nullif() and coalesce(). These functions help users
interpret NULL values.


best regards,

ludwig



__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com

Re: RFC: A brief guide to nulls

From

Tom Lane

Date:

15 January 2003, 23:20:33

Peter Eisentraut <peter_e@gmx.net> writes:
> dev@archonet.com writes:
>> A null is *not* a value.
>> A null is *not* a "special" value.
>> A null is the absence of a value.

> A quotation directly from the SQL standard:
>     Every data type includes a special value, called the null value,
> This seems to directly contradict those three statements.

I think you can look at it either way.  The traditional mathematical
approach to this sort of thing has been to consider that every data
type includes an "undefined" value (sometimes called "bottom", often
written as an upside-down T).  But the specific semantics assigned to
this concept in SQL definitely correspond to the idea that there's
a missing data entry.  And those who like to think about the bits prefer
to imagine a separate "its-null" flag bit, as someone else noted in this
thread.

The real bottom line is that the language provides you with a concept
"NULL" that has very specific (and less than intuitive) semantics.
To make use of this concept in your application, you have to interpret
it in a way that is useful for your application --- and doesn't conflict
with the SQL-defined semantics.
        regards, tom lane

Re: RFC: A brief guide to nulls

From

dev@archonet.com

Date:

16 January 2003, 04:07:51

>
> --- dev@archonet.com wrote:
>> There have been a few posts recently where people
>> have had problems with
>> nulls. Anyone got comments on the below before I
>> submit it to techdocs?
>>
>> TIA
>>
>> - Richard Huxton
>>
>> A Brief Guide to NULLs
>> ======================
>>
>> What is a null?
>> ===============
>> A null is *not* an empty string.
>> A null is *not* a value.
>> A null is *not* a "special" value.
>> A null is the absence of a value.
>>
>>
>> What do nulls mean?
>> ===================
>> Well, they *should* mean one of two things:
>> 1. There is no applicable value
>> 2. There is a value but it is unknown
>
>   Good job!, it sure helps people who don't much
> background on formal database theory.
>
>   What about adding this to the section "What does
> nulls mean"
>  --> 3) No value has yet been assigned to that
> particular attribute (field).
>
>
>   I think it would also be nice if you can add a
> section on functions that deals with NULL such as
> nullif() and coalesce(). These functions help users
> interpret NULL values.
>
>
> best regards,
>
> ludwig
>
>
>
> __________________________________________________
> Do you Yahoo!?
> New DSL Internet Access from SBC & Yahoo!
> http://sbc.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>

Re: RFC: A brief guide to nulls

From

dev@archonet.com

Date:

16 January 2003, 04:11:24

Apologies for previous post with no content - hit send by mistake.

Thanks to everyone for the feedback, 2nd draft out later today/tomorrow.
- Richard Huxton

Re: RFC: A brief guide to nulls (noarchive)

From

"Ries van Twisk"

Date:

16 January 2003, 04:59:40

You might add this as an example:

mytable
------------
id    value
1    1
2    2
3    3
4    <NULL>
5    4
6    5

-- Count ALL records
SELECT count(*) FROM mytable;
Result: 6

-- Count id records
SELECT count(id) FROM mytable;
Result: 6

-- Count value records
SELECT count(value) FROM mytable;
Result: 5

-- Sum of all values
SELECT sum(value) FROM mytable;
Result: 15

-- Average of the values
SELECT sum(value)/count(value) FROM mytable;
Result: 3

-- !!!WRONG!!! Method of the average
SELECT sum(value)/count(*) FROM mytable;
Result: 2.5


What I try to do and what I advice to the novice database designer is try to
avoid NULLS. Why??? because it's differcult to think in tree based logic.
As for a example in the case of the customer sex.
You can use 'M' for male, 'F' for female, 'U' for unknown and 'N' for 'not
applicapable'.
In this way you can use the NOT NULL contsraint so the novice programmer can
work with actual values instead of 'forgetting' about the NULLS. This idea
can be helpfull for the novice database designer.

Just a thought...


Ries van Twisk





> -----Oorspronkelijk bericht-----
> Van: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]Namens dev@archonet.com
> Verzonden: woensdag 15 januari 2003 18:23
> Aan: pgsql-sql@postgresql.org
> Onderwerp: [SQL] RFC: A brief guide to nulls
>
>
> There have been a few posts recently where people have had
> problems with
> nulls. Anyone got comments on the below before I submit it to
> techdocs?
>
> TIA
>
> - Richard Huxton
>
> A Brief Guide to NULLs
> ======================
>
> What is a null?
> ===============
> A null is *not* an empty string.
> A null is *not* a value.
> A null is *not* a "special" value.
> A null is the absence of a value.
>
>
> What do nulls mean?
> ===================
> Well, they *should* mean one of two things:
> 1. There is no applicable value
> 2. There is a value but it is unknown
>
> Example 1: Imagine you have a customer table with name and sex fields.
> If you get a new customer "ACME Widgets Ltd", the sex field
> is meaningless
> since your customer is a company (case 1).
> If you get a new customer "Jackie Smith" they might be male
> or female, but
> you might not know (case 2).
>
> Example 2: You have an address table with
> (street,city,county,postalcode)
> fields.
> You might insert an address ("10 Downing
> Street","London",Null,"WC1 1AA")
> since you don't have a valid county.
> You might also insert an address ("1 Any
> Street","Maidstone","Kent",Null)
> where there *must be* a valid postalcode, but you don't know
> what it is.
>
> It might be useful to be able to distinguish between these
> two cases - not
> applicable and unknown, but there is only one option "Null"
> available to
> us, so we can't.
>
>
> How do nulls work?
> ==================
> There is one very important rule when dealing with nulls. The
> result of
> any operation or comparison, when applied to a null is null. The only
> exception is testing if a value is null.
>
> Example: with the customer table above you could run the
> following queries:
>   SELECT * FROM customer WHERE sex='M';
>   SELECT * FROM customer WHERE sex<>'M';
> Now you might think this returns all customers, but it will miss those
> where sex is null. You've asked for all rows where the value
> of sex is 'M'
> and all those with values not equal to 'M' but not rows with
> *no value at
> all*
>
> It might help to think of a database as a set of statements
> you *know* to
> be true. A null indicates that you *cannot say anything at
> all* about that
> field. You can't say what it is, you can't say what it isn't,
> you can only
> say there is some information missing.
>
> So, to see all the customers with unknown or inapplicable sex
> you would need:
>   SELECT * FROM customer WHERE sex IS NULL;
>
> There are actually three possible results for a test in SQL -
> True (the
> test passed), False (the test failed) and Null (you tested against a
> null). A result of null usually gets treated as False, so
> testing against
> nulls always fails.
>
> If you try to perform an operation on nulls, again the result
> is always
> null. So the results of all of the following are null:
>   SELECT 'abc' || null;
>   SELECT 1 + null;
>   SELECT sqrt(null::numeric);
> The first case can be especially confusing. Concatenating a
> null string to
> a string value will return null, not the original value.
>
>
> Uniqueness and nulls
> ====================
> If you define a unique index on a column it prevents you inserting two
> values that are the same. It does not prevent you inserting
> as many nulls
> as you like. How could it, you don't have a value so it can't
> be the same
> as any other.
>
> Example: We create a table "ta" with a unique constraint on column "b"
>   CREATE TABLE ta (
>     a int4,
>     b varchar(3),
>     PRIMARY KEY (a)
>   );
>   CREATE UNIQUE INDEX ta_b_idx ON ta (b);
>   INSERT INTO ta VALUES (1,'aaa');  -- succeeds
>   INSERT INTO ta VALUES (2,'bbb');  -- succeeds
>   INSERT INTO ta VALUES (3,null);   -- succeeds
>   INSERT INTO ta VALUES (4,'bbb');  -- fails
>   INSERT INTO ta VALUES (5,null);   -- succeeds!
>
> Given the definition of what a null is, you only have two
> choices: allow
> multiple nulls or allow no nulls. If you want no nulls,
> define the column
> as NOT NULL when creating the table.
>
>
> Keys and nulls
> ==============
> No column that is part of a primary key can be null. When you define a
> PRIMARY KEY, none of the columns mentioned can take a null value.
> Postgresql makes sure of this by defining the columns as NOT
> NULL for you.
>
> Example: With table "ta" we just created, \d ta will show column a as
> being not null. Otherwise, we could insert rows with a set to null and
> have no way to tell them apart.
>
>
> Subqueries and nulls
> ====================
> Since tests always fail when testing against nulls you can
> have unexpected
> results with sub-queries.
>
> Example: Assume we have a companies table and a diary table.
> Diary entries
> are usually related to a particular company but not always.
>   SELECT co_name FROM companies WHERE co_id NOT IN (SELECT
> dy_company FROM
> diary);
> If any row in diary contains a null dy_company then you will get *no
> results*.
> We can expand the query like so:
>   WHERE co_id NOT IN (SELECT dy_company FROM diary)
>   WHERE co_id NOT IN (1, 2, null, 3...)
>   WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...)
>   WHERE NOT (... OR null OR ...)
>   WHERE NOT (null)
>   WHERE null
>
> You either need to explicitly check for null values, or
> define the column
> in question as NOT NULL.
>
>
> Further Information
> ===================
> Any good relational database book (try something written by Date)
> Bruce's book (link LHS at http://techdocs.postgresql.org)
> My Postgresql Notes (link at http://techdocs.postgresql.org)
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Re: RFC: A brief guide to nulls

From

"Otto Hirr"

Date:

16 January 2003, 12:35:27

I think that having this topic defined and available
will be of great benefit... !!! Thanks Richard.

Some additional thoughts based upon what other people
have explicitly or implicitly implied.  Peter quoted the sql
definition as:   >Every data type includes a special value, called the null value,
and my earlier post implied that, but...

I believe that one needs to be VERY explicit in the
distinction between sementic meaning of some field
and the values accessable.  Using the example proposed
in earlier posts, the storing of sex field for a customer.

The example query was:
>  SELECT * FROM customer WHERE sex='M';
>  SELECT * FROM customer WHERE sex<>'M';

One person suggested:
>For example sex could be classified as
> 'n'  - not applicable
> 'f'  - female
> 'm'  - male
> null - yet unknown
>...
>if customers sex is unknown - null, we can't decide
>whether they're men or not.

Which is straying way into the semantic information of
the field. In reality we have:

+Fieldname(sex)------------------+
|..............|value|{somevalue}|
|..............|nullP|{1or0}     |
+--------------------------------+

Every field has a value portion and a nullP portion.
Here I use nullP, which is either 1 or 0, indicating
that the field is NULL or NOTNULL.

NOTNULL means that the value field has a value.
NULL means that the value field does not have a value.

Period. Anything else begins to stray into the semantic
range. Using the above...

> 'n'  - not applicable
> 'f'  - female
> 'm'  - male
> null - yet unknown

which we could code as..
.....................................meaning
+Fieldname(sex)------------------+
|..............|value|{n}        |   not applicable
|..............|nullP|{0}        |
+--------------------------------+
+Fieldname(sex)------------------+
|..............|value|{f}        |   female
|..............|nullP|{0}        |
+--------------------------------+
+Fieldname(sex)------------------+
|..............|value|{m}        |   male
|..............|nullP|{0}        |
+--------------------------------+
+Fieldname(sex)------------------+
|..............|value|{}         |   
|..............|nullP|{1}        |   <no value>
+--------------------------------+

If I was a clerk, looking at a person and could not
tell their sex due to appearance, etc, that should
probably be coded: 'u'-unknown

The "concept" of null meaning unknown is SEMANTIC...
in the case of sex.

The nullP, ie testing via NULL/NOTNULL, means that
one can either test or (should/can) access the value field.
No other meaning should be implied. The application
on up the ladder implies the meaning behind not having
a value.

The point I'm trying to make here is that one should not
intermix the application meaning of having/not-having
a value with the value itself.

Especially since you can have a NOTNULL field that
has a text value having no bytes in it, i.e. a string
field that has no characters...

Humm... don't know if I made my point clear as mud...


Best regards,

.. Otto