Re: RFC: A brief guide to nulls (noarchive) - Mailing list pgsql-sql
From | Ries van Twisk |
---|---|
Subject | Re: RFC: A brief guide to nulls (noarchive) |
Date | |
Msg-id | 001d01c2bd46$4652d830$f100000a@IT001 Whole thread Raw |
In response to | RFC: A brief guide to nulls (dev@archonet.com) |
List | pgsql-sql |
You might add this as an example: mytable ------------ id value 1 1 2 2 3 3 4 <NULL> 5 4 6 5 -- Count ALL records SELECT count(*) FROM mytable; Result: 6 -- Count id records SELECT count(id) FROM mytable; Result: 6 -- Count value records SELECT count(value) FROM mytable; Result: 5 -- Sum of all values SELECT sum(value) FROM mytable; Result: 15 -- Average of the values SELECT sum(value)/count(value) FROM mytable; Result: 3 -- !!!WRONG!!! Method of the average SELECT sum(value)/count(*) FROM mytable; Result: 2.5 What I try to do and what I advice to the novice database designer is try to avoid NULLS. Why??? because it's differcult to think in tree based logic. As for a example in the case of the customer sex. You can use 'M' for male, 'F' for female, 'U' for unknown and 'N' for 'not applicapable'. In this way you can use the NOT NULL contsraint so the novice programmer can work with actual values instead of 'forgetting' about the NULLS. This idea can be helpfull for the novice database designer. Just a thought... Ries van Twisk > -----Oorspronkelijk bericht----- > Van: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]Namens dev@archonet.com > Verzonden: woensdag 15 januari 2003 18:23 > Aan: pgsql-sql@postgresql.org > Onderwerp: [SQL] RFC: A brief guide to nulls > > > There have been a few posts recently where people have had > problems with > nulls. Anyone got comments on the below before I submit it to > techdocs? > > TIA > > - Richard Huxton > > A Brief Guide to NULLs > ====================== > > What is a null? > =============== > A null is *not* an empty string. > A null is *not* a value. > A null is *not* a "special" value. > A null is the absence of a value. > > > What do nulls mean? > =================== > Well, they *should* mean one of two things: > 1. There is no applicable value > 2. There is a value but it is unknown > > Example 1: Imagine you have a customer table with name and sex fields. > If you get a new customer "ACME Widgets Ltd", the sex field > is meaningless > since your customer is a company (case 1). > If you get a new customer "Jackie Smith" they might be male > or female, but > you might not know (case 2). > > Example 2: You have an address table with > (street,city,county,postalcode) > fields. > You might insert an address ("10 Downing > Street","London",Null,"WC1 1AA") > since you don't have a valid county. > You might also insert an address ("1 Any > Street","Maidstone","Kent",Null) > where there *must be* a valid postalcode, but you don't know > what it is. > > It might be useful to be able to distinguish between these > two cases - not > applicable and unknown, but there is only one option "Null" > available to > us, so we can't. > > > How do nulls work? > ================== > There is one very important rule when dealing with nulls. The > result of > any operation or comparison, when applied to a null is null. The only > exception is testing if a value is null. > > Example: with the customer table above you could run the > following queries: > SELECT * FROM customer WHERE sex='M'; > SELECT * FROM customer WHERE sex<>'M'; > Now you might think this returns all customers, but it will miss those > where sex is null. You've asked for all rows where the value > of sex is 'M' > and all those with values not equal to 'M' but not rows with > *no value at > all* > > It might help to think of a database as a set of statements > you *know* to > be true. A null indicates that you *cannot say anything at > all* about that > field. You can't say what it is, you can't say what it isn't, > you can only > say there is some information missing. > > So, to see all the customers with unknown or inapplicable sex > you would need: > SELECT * FROM customer WHERE sex IS NULL; > > There are actually three possible results for a test in SQL - > True (the > test passed), False (the test failed) and Null (you tested against a > null). A result of null usually gets treated as False, so > testing against > nulls always fails. > > If you try to perform an operation on nulls, again the result > is always > null. So the results of all of the following are null: > SELECT 'abc' || null; > SELECT 1 + null; > SELECT sqrt(null::numeric); > The first case can be especially confusing. Concatenating a > null string to > a string value will return null, not the original value. > > > Uniqueness and nulls > ==================== > If you define a unique index on a column it prevents you inserting two > values that are the same. It does not prevent you inserting > as many nulls > as you like. How could it, you don't have a value so it can't > be the same > as any other. > > Example: We create a table "ta" with a unique constraint on column "b" > CREATE TABLE ta ( > a int4, > b varchar(3), > PRIMARY KEY (a) > ); > CREATE UNIQUE INDEX ta_b_idx ON ta (b); > INSERT INTO ta VALUES (1,'aaa'); -- succeeds > INSERT INTO ta VALUES (2,'bbb'); -- succeeds > INSERT INTO ta VALUES (3,null); -- succeeds > INSERT INTO ta VALUES (4,'bbb'); -- fails > INSERT INTO ta VALUES (5,null); -- succeeds! > > Given the definition of what a null is, you only have two > choices: allow > multiple nulls or allow no nulls. If you want no nulls, > define the column > as NOT NULL when creating the table. > > > Keys and nulls > ============== > No column that is part of a primary key can be null. When you define a > PRIMARY KEY, none of the columns mentioned can take a null value. > Postgresql makes sure of this by defining the columns as NOT > NULL for you. > > Example: With table "ta" we just created, \d ta will show column a as > being not null. Otherwise, we could insert rows with a set to null and > have no way to tell them apart. > > > Subqueries and nulls > ==================== > Since tests always fail when testing against nulls you can > have unexpected > results with sub-queries. > > Example: Assume we have a companies table and a diary table. > Diary entries > are usually related to a particular company but not always. > SELECT co_name FROM companies WHERE co_id NOT IN (SELECT > dy_company FROM > diary); > If any row in diary contains a null dy_company then you will get *no > results*. > We can expand the query like so: > WHERE co_id NOT IN (SELECT dy_company FROM diary) > WHERE co_id NOT IN (1, 2, null, 3...) > WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) > WHERE NOT (... OR null OR ...) > WHERE NOT (null) > WHERE null > > You either need to explicitly check for null values, or > define the column > in question as NOT NULL. > > > Further Information > =================== > Any good relational database book (try something written by Date) > Bruce's book (link LHS at http://techdocs.postgresql.org) > My Postgresql Notes (link at http://techdocs.postgresql.org) > > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >