Thread: RFC: A brief guide to nulls
There have been a few posts recently where people have had problems with nulls. Anyone got comments on the below before I submit it to techdocs? TIA - Richard Huxton A Brief Guide to NULLs ====================== What is a null? =============== A null is *not* an empty string. A null is *not* a value. A null is *not* a "special" value. A null is the absence of a value. What do nulls mean? =================== Well, they *should* mean one of two things: 1. There is no applicable value 2. There is a value but it is unknown Example 1: Imagine you have a customer table with name and sex fields. If you get a new customer "ACME Widgets Ltd", the sex field is meaningless since your customer is a company (case 1). If you get a new customer "Jackie Smith" they might be male or female, but you might not know (case 2). Example 2: You have an address table with (street,city,county,postalcode) fields. You might insert an address ("10 Downing Street","London",Null,"WC1 1AA") since you don't have a valid county. You might also insert an address ("1 Any Street","Maidstone","Kent",Null) where there *must be* a valid postalcode, but you don't know what it is. It might be useful to be able to distinguish between these two cases - not applicable and unknown, but there is only one option "Null" available to us, so we can't. How do nulls work? ================== There is one very important rule when dealing with nulls. The result of any operation or comparison, when applied to a null is null. The only exception is testing if a value is null. Example: with the customer table above you could run the following queries: SELECT * FROM customer WHERE sex='M'; SELECT* FROM customer WHERE sex<>'M'; Now you might think this returns all customers, but it will miss those where sex is null. You've asked for all rows where the value of sex is 'M' and all those with values not equal to 'M' but not rows with *no value at all* It might help to think of a database as a set of statements you *know* to be true. A null indicates that you *cannot say anything at all* about that field. You can't say what it is, you can't say what it isn't, you can only say there is some information missing. So, to see all the customers with unknown or inapplicable sex you would need: SELECT * FROM customer WHERE sex IS NULL; There are actually three possible results for a test in SQL - True (the test passed), False (the test failed) and Null (you tested against a null). A result of null usually gets treated as False, so testing against nulls always fails. If you try to perform an operation on nulls, again the result is always null. So the results of all of the following are null: SELECT 'abc' || null; SELECT 1 + null; SELECT sqrt(null::numeric); The first case can be especially confusing. Concatenating a null string to a string value will return null, not the original value. Uniqueness and nulls ==================== If you define a unique index on a column it prevents you inserting two values that are the same. It does not prevent you inserting as many nulls as you like. How could it, you don't have a value so it can't be the same as any other. Example: We create a table "ta" with a unique constraint on column "b" CREATE TABLE ta ( a int4,b varchar(3),PRIMARY KEY(a) ); CREATE UNIQUE INDEX ta_b_idx ON ta (b); INSERT INTO ta VALUES (1,'aaa'); -- succeeds INSERT INTO ta VALUES (2,'bbb'); -- succeeds INSERT INTO ta VALUES (3,null); -- succeeds INSERT INTO ta VALUES (4,'bbb'); -- fails INSERT INTOta VALUES (5,null); -- succeeds! Given the definition of what a null is, you only have two choices: allow multiple nulls or allow no nulls. If you want no nulls, define the column as NOT NULL when creating the table. Keys and nulls ============== No column that is part of a primary key can be null. When you define a PRIMARY KEY, none of the columns mentioned can take a null value. Postgresql makes sure of this by defining the columns as NOT NULL for you. Example: With table "ta" we just created, \d ta will show column a as being not null. Otherwise, we could insert rows with a set to null and have no way to tell them apart. Subqueries and nulls ==================== Since tests always fail when testing against nulls you can have unexpected results with sub-queries. Example: Assume we have a companies table and a diary table. Diary entries are usually related to a particular company but not always. SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_companyFROM diary); If any row in diary contains a null dy_company then you will get *no results*. We can expand the query like so: WHERE co_id NOT IN (SELECT dy_company FROM diary) WHERE co_id NOT IN (1, 2, null, 3...)WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) WHERE NOT (... OR null OR ...) WHERE NOT (null) WHERE null You either need to explicitly check for null values, or define the column in question as NOT NULL. Further Information =================== Any good relational database book (try something written by Date) Bruce's book (link LHS at http://techdocs.postgresql.org) My Postgresql Notes (link at http://techdocs.postgresql.org)
Another viewpoint, is to go a bit further into implementation... i.e. for every field, there are two items... *one item being the null/notnull info *other item being the value info of the field per the type at def time This faciliatates the concepts of where <fieldname> is null ... testing the null/notnull info where <fieldname> is not null ... testing the null/notnull info where <fieldname> = "somevalue" ... testing the value info etc. Trying to "hide" the concept that every field has TWO items, the value and whether it is null/notnull, seems to confuse people. By SIMPLY stating that every SQL field has two items, then causes a scripter to think a bit more deeply about a where clause... Hence making your example ... > SELECT * FROM customer WHERE sex='M'; > SELECT * FROM customer WHERE sex<>'M'; > Now you might think this returns all customers, but it will miss those > where sex is null. You've asked for all rows where the value of sex is 'M' > and all those with values not equal to 'M' but not rows with *no value at all* all that less mysterious... i.e. the above two selects only tested one of the two items for the SQL field, the other item to test is the null/notnull item. .. Otto > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of dev@archonet.com > Sent: Wednesday, January 15, 2003 9:23 AM > To: pgsql-sql@postgresql.org > Subject: RFC: A brief guide to nulls > > > There have been a few posts recently where people have had > problems with > nulls. Anyone got comments on the below before I submit it to > techdocs? > > TIA > > - Richard Huxton > > A Brief Guide to NULLs > ====================== > > What is a null? > =============== > A null is *not* an empty string. > A null is *not* a value. > A null is *not* a "special" value. > A null is the absence of a value. > > > What do nulls mean? > =================== > Well, they *should* mean one of two things: > 1. There is no applicable value > 2. There is a value but it is unknown > > Example 1: Imagine you have a customer table with name and sex fields. > If you get a new customer "ACME Widgets Ltd", the sex field > is meaningless > since your customer is a company (case 1). > If you get a new customer "Jackie Smith" they might be male > or female, but > you might not know (case 2). > > Example 2: You have an address table with > (street,city,county,postalcode) > fields. > You might insert an address ("10 Downing > Street","London",Null,"WC1 1AA") > since you don't have a valid county. > You might also insert an address ("1 Any > Street","Maidstone","Kent",Null) > where there *must be* a valid postalcode, but you don't know > what it is. > > It might be useful to be able to distinguish between these > two cases - not > applicable and unknown, but there is only one option "Null" > available to > us, so we can't. > > > How do nulls work? > ================== > There is one very important rule when dealing with nulls. The > result of > any operation or comparison, when applied to a null is null. The only > exception is testing if a value is null. > > Example: with the customer table above you could run the > following queries: > SELECT * FROM customer WHERE sex='M'; > SELECT * FROM customer WHERE sex<>'M'; > Now you might think this returns all customers, but it will miss those > where sex is null. You've asked for all rows where the value > of sex is 'M' > and all those with values not equal to 'M' but not rows with > *no value at > all* > > It might help to think of a database as a set of statements > you *know* to > be true. A null indicates that you *cannot say anything at > all* about that > field. You can't say what it is, you can't say what it isn't, > you can only > say there is some information missing. > > So, to see all the customers with unknown or inapplicable sex > you would need: > SELECT * FROM customer WHERE sex IS NULL; > > There are actually three possible results for a test in SQL - > True (the > test passed), False (the test failed) and Null (you tested against a > null). A result of null usually gets treated as False, so > testing against > nulls always fails. > > If you try to perform an operation on nulls, again the result > is always > null. So the results of all of the following are null: > SELECT 'abc' || null; > SELECT 1 + null; > SELECT sqrt(null::numeric); > The first case can be especially confusing. Concatenating a > null string to > a string value will return null, not the original value. > > > Uniqueness and nulls > ==================== > If you define a unique index on a column it prevents you inserting two > values that are the same. It does not prevent you inserting > as many nulls > as you like. How could it, you don't have a value so it can't > be the same > as any other. > > Example: We create a table "ta" with a unique constraint on column "b" > CREATE TABLE ta ( > a int4, > b varchar(3), > PRIMARY KEY (a) > ); > CREATE UNIQUE INDEX ta_b_idx ON ta (b); > INSERT INTO ta VALUES (1,'aaa'); -- succeeds > INSERT INTO ta VALUES (2,'bbb'); -- succeeds > INSERT INTO ta VALUES (3,null); -- succeeds > INSERT INTO ta VALUES (4,'bbb'); -- fails > INSERT INTO ta VALUES (5,null); -- succeeds! > > Given the definition of what a null is, you only have two > choices: allow > multiple nulls or allow no nulls. If you want no nulls, > define the column > as NOT NULL when creating the table. > > > Keys and nulls > ============== > No column that is part of a primary key can be null. When you define a > PRIMARY KEY, none of the columns mentioned can take a null value. > Postgresql makes sure of this by defining the columns as NOT > NULL for you. > > Example: With table "ta" we just created, \d ta will show column a as > being not null. Otherwise, we could insert rows with a set to null and > have no way to tell them apart. > > > Subqueries and nulls > ==================== > Since tests always fail when testing against nulls you can > have unexpected > results with sub-queries. > > Example: Assume we have a companies table and a diary table. > Diary entries > are usually related to a particular company but not always. > SELECT co_name FROM companies WHERE co_id NOT IN (SELECT > dy_company FROM > diary); > If any row in diary contains a null dy_company then you will get *no > results*. > We can expand the query like so: > WHERE co_id NOT IN (SELECT dy_company FROM diary) > WHERE co_id NOT IN (1, 2, null, 3...) > WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) > WHERE NOT (... OR null OR ...) > WHERE NOT (null) > WHERE null > > You either need to explicitly check for null values, or > define the column > in question as NOT NULL. > > > Further Information > =================== > Any good relational database book (try something written by Date) > Bruce's book (link LHS at http://techdocs.postgresql.org) > My Postgresql Notes (link at http://techdocs.postgresql.org) > > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > >
*Disclaimer:* I do not have any formal training in database theory. However, I have done quite a lot of work with databases in practice, and have some understanding of "missing values" in statistics. I would tend to think of the "no applicable value" case of a NULL as being the result of poor or lazy planning, and therefore not really true NULL values. (This is not to say I don't *do* it, but that it's not theoretically appropriate to interpret a NULL as being "no applicable value.") To use your example 1, I would think a properly-planned and structured database should account for the possibility of a sexless customer by means of relationality: customers:custidname cust_sexes:custidsex which would get rid of the NULL for the sex of ACME Widgets Ltd. By contrast, it wouldn't get rid of the NULL for Jackie Smith, who would appropriately be represented by an entry in customers (1111, 'Jackie Smith') and one in cust_sexes (1111, NULL). (Otherwise the introduction is excellent.) Any comments? Andy ---------------------------------------------------------------------- Andrew J Perrin - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu On Wed, 15 Jan 2003 dev@archonet.com wrote: > There have been a few posts recently where people have had problems with > nulls. Anyone got comments on the below before I submit it to techdocs? > > TIA > > - Richard Huxton > > A Brief Guide to NULLs > ====================== > > What is a null? > =============== > A null is *not* an empty string. > A null is *not* a value. > A null is *not* a "special" value. > A null is the absence of a value. > > > What do nulls mean? > =================== > Well, they *should* mean one of two things: > 1. There is no applicable value > 2. There is a value but it is unknown > > Example 1: Imagine you have a customer table with name and sex fields. > If you get a new customer "ACME Widgets Ltd", the sex field is meaningless > since your customer is a company (case 1). > If you get a new customer "Jackie Smith" they might be male or female, but > you might not know (case 2). > > Example 2: You have an address table with (street,city,county,postalcode) > fields. > You might insert an address ("10 Downing Street","London",Null,"WC1 1AA") > since you don't have a valid county. > You might also insert an address ("1 Any Street","Maidstone","Kent",Null) > where there *must be* a valid postalcode, but you don't know what it is. > > It might be useful to be able to distinguish between these two cases - not > applicable and unknown, but there is only one option "Null" available to > us, so we can't. > > > How do nulls work? > ================== > There is one very important rule when dealing with nulls. The result of > any operation or comparison, when applied to a null is null. The only > exception is testing if a value is null. > > Example: with the customer table above you could run the following queries: > SELECT * FROM customer WHERE sex='M'; > SELECT * FROM customer WHERE sex<>'M'; > Now you might think this returns all customers, but it will miss those > where sex is null. You've asked for all rows where the value of sex is 'M' > and all those with values not equal to 'M' but not rows with *no value at > all* > > It might help to think of a database as a set of statements you *know* to > be true. A null indicates that you *cannot say anything at all* about that > field. You can't say what it is, you can't say what it isn't, you can only > say there is some information missing. > > So, to see all the customers with unknown or inapplicable sex you would need: > SELECT * FROM customer WHERE sex IS NULL; > > There are actually three possible results for a test in SQL - True (the > test passed), False (the test failed) and Null (you tested against a > null). A result of null usually gets treated as False, so testing against > nulls always fails. > > If you try to perform an operation on nulls, again the result is always > null. So the results of all of the following are null: > SELECT 'abc' || null; > SELECT 1 + null; > SELECT sqrt(null::numeric); > The first case can be especially confusing. Concatenating a null string to > a string value will return null, not the original value. > > > Uniqueness and nulls > ==================== > If you define a unique index on a column it prevents you inserting two > values that are the same. It does not prevent you inserting as many nulls > as you like. How could it, you don't have a value so it can't be the same > as any other. > > Example: We create a table "ta" with a unique constraint on column "b" > CREATE TABLE ta ( > a int4, > b varchar(3), > PRIMARY KEY (a) > ); > CREATE UNIQUE INDEX ta_b_idx ON ta (b); > INSERT INTO ta VALUES (1,'aaa'); -- succeeds > INSERT INTO ta VALUES (2,'bbb'); -- succeeds > INSERT INTO ta VALUES (3,null); -- succeeds > INSERT INTO ta VALUES (4,'bbb'); -- fails > INSERT INTO ta VALUES (5,null); -- succeeds! > > Given the definition of what a null is, you only have two choices: allow > multiple nulls or allow no nulls. If you want no nulls, define the column > as NOT NULL when creating the table. > > > Keys and nulls > ============== > No column that is part of a primary key can be null. When you define a > PRIMARY KEY, none of the columns mentioned can take a null value. > Postgresql makes sure of this by defining the columns as NOT NULL for you. > > Example: With table "ta" we just created, \d ta will show column a as > being not null. Otherwise, we could insert rows with a set to null and > have no way to tell them apart. > > > Subqueries and nulls > ==================== > Since tests always fail when testing against nulls you can have unexpected > results with sub-queries. > > Example: Assume we have a companies table and a diary table. Diary entries > are usually related to a particular company but not always. > SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM > diary); > If any row in diary contains a null dy_company then you will get *no > results*. > We can expand the query like so: > WHERE co_id NOT IN (SELECT dy_company FROM diary) > WHERE co_id NOT IN (1, 2, null, 3...) > WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) > WHERE NOT (... OR null OR ...) > WHERE NOT (null) > WHERE null > > You either need to explicitly check for null values, or define the column > in question as NOT NULL. > > > Further Information > =================== > Any good relational database book (try something written by Date) > Bruce's book (link LHS at http://techdocs.postgresql.org) > My Postgresql Notes (link at http://techdocs.postgresql.org) > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Richard, Thanks for writing this! Would it be cool if we included it in the Linux Weekly News? I'll ask Robert. Two edits: On Wednesday 15 January 2003 09:23, dev@archonet.com wrote: > There have been a few posts recently where people have had problems with > nulls. Anyone got comments on the below before I submit it to techdocs? > > TIA > > - Richard Huxton > > A Brief Guide to NULLs > ====================== > > What is a null? > =============== > A null is *not* an empty string. > A null is *not* a value. > A null is *not* a "special" value. > A null is the absence of a value. > > > What do nulls mean? > =================== > Well, they *should* mean one of two things: > 1. There is no applicable value > 2. There is a value but it is unknown > > Example 1: Imagine you have a customer table with name and sex fields. > If you get a new customer "ACME Widgets Ltd", the sex field is meaningless > since your customer is a company (case 1). > If you get a new customer "Jackie Smith" they might be male or female, but > you might not know (case 2). > > Example 2: You have an address table with (street,city,county,postalcode) > fields. > You might insert an address ("10 Downing Street","London",Null,"WC1 1AA") > since you don't have a valid county. > You might also insert an address ("1 Any Street","Maidstone","Kent",Null) > where there *must be* a valid postalcode, but you don't know what it is. > > It might be useful to be able to distinguish between these two cases - not > applicable and unknown, but there is only one option "Null" available to > us, so we can't. > > > How do nulls work? > ================== > There is one very important rule when dealing with nulls. The result of > any operation or comparison, when applied to a null is null. The only > exception is testing if a value is null. > > Example: with the customer table above you could run the following queries: > SELECT * FROM customer WHERE sex='M'; > SELECT * FROM customer WHERE sex<>'M'; > Now you might think this returns all customers, but it will miss those > where sex is null. You've asked for all rows where the value of sex is 'M' > and all those with values not equal to 'M' but not rows with *no value at > all* > > It might help to think of a database as a set of statements you *know* to > be true. A null indicates that you *cannot say anything at all* about that > field. You can't say what it is, you can't say what it isn't, you can only > say there is some information missing. > > So, to see all the customers with unknown or inapplicable sex you would need: > SELECT * FROM customer WHERE sex IS NULL; ** MOVE the above paragraph .... > > There are actually three possible results for a test in SQL - True (the > test passed), False (the test failed) and Null (you tested against a > null). A result of null usually gets treated as False, so testing against > nulls always fails. > > If you try to perform an operation on nulls, again the result is always > null. So the results of all of the following are null: > SELECT 'abc' || null; > SELECT 1 + null; > SELECT sqrt(null::numeric); > The first case can be especially confusing. Concatenating a null string to > a string value will return null, not the original value. ** ... down to here, and define the IS NULL and IS NOT NULL operators for the reader. > > Uniqueness and nulls > ==================== > If you define a unique index on a column it prevents you inserting two > values that are the same. It does not prevent you inserting as many nulls > as you like. How could it, you don't have a value so it can't be the same > as any other. > > Example: We create a table "ta" with a unique constraint on column "b" > CREATE TABLE ta ( > a int4, > b varchar(3), > PRIMARY KEY (a) > ); > CREATE UNIQUE INDEX ta_b_idx ON ta (b); > INSERT INTO ta VALUES (1,'aaa'); -- succeeds > INSERT INTO ta VALUES (2,'bbb'); -- succeeds > INSERT INTO ta VALUES (3,null); -- succeeds > INSERT INTO ta VALUES (4,'bbb'); -- fails > INSERT INTO ta VALUES (5,null); -- succeeds! > > Given the definition of what a null is, you only have two choices: allow > multiple nulls or allow no nulls. If you want no nulls, define the column > as NOT NULL when creating the table. > > > Keys and nulls > ============== > No column that is part of a primary key can be null. When you define a > PRIMARY KEY, none of the columns mentioned can take a null value. > Postgresql makes sure of this by defining the columns as NOT NULL for you. > > Example: With table "ta" we just created, \d ta will show column a as > being not null. Otherwise, we could insert rows with a set to null and > have no way to tell them apart. > > > Subqueries and nulls > ==================== > Since tests always fail when testing against nulls you can have unexpected > results with sub-queries. > > Example: Assume we have a companies table and a diary table. Diary entries > are usually related to a particular company but not always. > SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM > diary); > If any row in diary contains a null dy_company then you will get *no > results*. > We can expand the query like so: > WHERE co_id NOT IN (SELECT dy_company FROM diary) > WHERE co_id NOT IN (1, 2, null, 3...) > WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) > WHERE NOT (... OR null OR ...) > WHERE NOT (null) > WHERE null I'd suggest that you do not want to give readers an example that contains NOT IN (subselect). There are almost no cricumstances under which a user would choose NOT IN (subselect) over WHERE NOT EXISTS (subselect), and giving it in an example will just confuse newbies. > You either need to explicitly check for null values, or define the column > in question as NOT NULL. > > > Further Information > =================== > Any good relational database book (try something written by Date) > Bruce's book (link LHS at http://techdocs.postgresql.org) > My Postgresql Notes (link at http://techdocs.postgresql.org) Add links to the two PostgreSQL book pages ... the one on Techdocs and the one on www.postgresql.org/users-lounge FWIW. -- -Josh BerkusAglio Database SolutionsSan Francisco
This should be added (From PostgreSQL integrated docs...): SQL uses a three-valued Boolean logic where the null value represents "unknown". Observe the following truth tables: a | b | a AND b | a OR b ------+-------+---------+-------- TRUE | TRUE | TRUE | TRUE TRUE | FALSE | FALSE | TRUE TRUE | NULL | NULL | TRUE FALSE | FALSE | FALSE | FALSE FALSE | NULL | FALSE | NULL NULL | NULL | NULL | NULL So, if there's any trues in the chain of ORs, the whole expression will be true, not null. This conforms to NULL representing unknown value. If you have "true or unknown", of course whole result is true regardless of the "unknown". Let's check this example: > Subqueries and nulls > ==================== > Since tests always fail when testing against nulls you can have unexpected > results with sub-queries. > > Example: Assume we have a companies table and a diary table. Diary entries > are usually related to a particular company but not always. > SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM > diary); > If any row in diary contains a null dy_company then you will get *no > results*. > We can expand the query like so: > WHERE co_id NOT IN (SELECT dy_company FROM diary) > WHERE co_id NOT IN (1, 2, null, 3...) > WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) > WHERE NOT (... OR null OR ...) > WHERE NOT (null) (erm... actually not) template1=# select 5 in (1, 2, 9, null);?column? ---------- (1 row) template1=# select 2 in (1, 2, 9, null);?column? ----------t (1 row) WHERE NOT (null/true) -> evaluates to WHERE null/false So the result was the same - but only for this example. Suppose you take the NOT away: template1=# select * from a where a.i in (1, 2, 9, null);i ---12 (2 rows) I surely see two lines (not 0)... :) And this could be added too, for clarification: "SELECT clause lists lines for which the WHERE expression is certainly known to be true." ;) -- Antti Haapala
On Wed, Jan 15, 2003 at 17:23:09 -0000, dev@archonet.com wrote: > > How do nulls work? > ================== > There is one very important rule when dealing with nulls. The result of > any operation or comparison, when applied to a null is null. The only > exception is testing if a value is null. I suggest rephrasing this to make it clear you need to use is null to do this and that = null won't do what you expect. > If you try to perform an operation on nulls, again the result is always > null. So the results of all of the following are null: The above isn't true (i.e. null op something isn't ALWAYS null). For example: bruno=> select 't'::boolean or null;?column? ----------t (1 row)
A few more comments... > It might be useful to be able to distinguish between these two cases - not > applicable and unknown, but there is only one option "Null" available to > us, so we can't. If we really need to distinguish between these two cases, I think null shouldn't be used as a N/A value but some other like empty string or 0. (IMHO it's preferable not to use null as N/A at all). For example sex could be classified as'n' - not applicable'f' - female'm' - malenull - yet unknown > Example: with the customer table above you could run the following queries: > SELECT * FROM customer WHERE sex='M'; > SELECT * FROM customer WHERE sex<>'M'; > Now you might think this returns all customers, but it will miss those > where sex is null. You've asked for all rows where the value of sex is 'M' > and all those with values not equal to 'M' but not rows with *no value at > all* these could be explained asselect all customers who surely are menselect all customers who surely aren't men if customers sex is unknown - null, we can't decide whether they're men or not. > The first case can be especially confusing. Concatenating a null string to > a string value will return null, not the original value. Isn't it null, not null string? ;) > Keys and nulls > ============== > No column that is part of a primary key can be null. When you define a > PRIMARY KEY, none of the columns mentioned can take a null value. > Postgresql makes sure of this by defining the columns as NOT NULL for you. ... because primary keys are to uniquelly identify rows in a table, and how's an unknown values going to do that :) -- Antti Haapala +358 50 369 3535 ICQ: #177673735
dev@archonet.com writes: > A null is *not* a value. > A null is *not* a "special" value. > A null is the absence of a value. A quotation directly from the SQL standard: Every data type includes a special value, called the null value, This seems to directly contradict those three statements. -- Peter Eisentraut peter_e@gmx.net
--- dev@archonet.com wrote: > There have been a few posts recently where people > have had problems with > nulls. Anyone got comments on the below before I > submit it to techdocs? > > TIA > > - Richard Huxton > > A Brief Guide to NULLs > ====================== > > What is a null? > =============== > A null is *not* an empty string. > A null is *not* a value. > A null is *not* a "special" value. > A null is the absence of a value. > > > What do nulls mean? > =================== > Well, they *should* mean one of two things: > 1. There is no applicable value > 2. There is a value but it is unknown Good job!, it sure helps people who don't much background on formal database theory. What about adding this to the section "What does nulls mean"--> 3) No value has yet been assigned to that particular attribute (field). I think it would also be nice if you can add a section on functions that deals with NULL such as nullif() and coalesce(). These functions help users interpret NULL values. best regards, ludwig __________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com
Peter Eisentraut <peter_e@gmx.net> writes: > dev@archonet.com writes: >> A null is *not* a value. >> A null is *not* a "special" value. >> A null is the absence of a value. > A quotation directly from the SQL standard: > Every data type includes a special value, called the null value, > This seems to directly contradict those three statements. I think you can look at it either way. The traditional mathematical approach to this sort of thing has been to consider that every data type includes an "undefined" value (sometimes called "bottom", often written as an upside-down T). But the specific semantics assigned to this concept in SQL definitely correspond to the idea that there's a missing data entry. And those who like to think about the bits prefer to imagine a separate "its-null" flag bit, as someone else noted in this thread. The real bottom line is that the language provides you with a concept "NULL" that has very specific (and less than intuitive) semantics. To make use of this concept in your application, you have to interpret it in a way that is useful for your application --- and doesn't conflict with the SQL-defined semantics. regards, tom lane
> > --- dev@archonet.com wrote: >> There have been a few posts recently where people >> have had problems with >> nulls. Anyone got comments on the below before I >> submit it to techdocs? >> >> TIA >> >> - Richard Huxton >> >> A Brief Guide to NULLs >> ====================== >> >> What is a null? >> =============== >> A null is *not* an empty string. >> A null is *not* a value. >> A null is *not* a "special" value. >> A null is the absence of a value. >> >> >> What do nulls mean? >> =================== >> Well, they *should* mean one of two things: >> 1. There is no applicable value >> 2. There is a value but it is unknown > > Good job!, it sure helps people who don't much > background on formal database theory. > > What about adding this to the section "What does > nulls mean" > --> 3) No value has yet been assigned to that > particular attribute (field). > > > I think it would also be nice if you can add a > section on functions that deals with NULL such as > nullif() and coalesce(). These functions help users > interpret NULL values. > > > best regards, > > ludwig > > > > __________________________________________________ > Do you Yahoo!? > New DSL Internet Access from SBC & Yahoo! > http://sbc.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > >
Apologies for previous post with no content - hit send by mistake. Thanks to everyone for the feedback, 2nd draft out later today/tomorrow. - Richard Huxton
You might add this as an example: mytable ------------ id value 1 1 2 2 3 3 4 <NULL> 5 4 6 5 -- Count ALL records SELECT count(*) FROM mytable; Result: 6 -- Count id records SELECT count(id) FROM mytable; Result: 6 -- Count value records SELECT count(value) FROM mytable; Result: 5 -- Sum of all values SELECT sum(value) FROM mytable; Result: 15 -- Average of the values SELECT sum(value)/count(value) FROM mytable; Result: 3 -- !!!WRONG!!! Method of the average SELECT sum(value)/count(*) FROM mytable; Result: 2.5 What I try to do and what I advice to the novice database designer is try to avoid NULLS. Why??? because it's differcult to think in tree based logic. As for a example in the case of the customer sex. You can use 'M' for male, 'F' for female, 'U' for unknown and 'N' for 'not applicapable'. In this way you can use the NOT NULL contsraint so the novice programmer can work with actual values instead of 'forgetting' about the NULLS. This idea can be helpfull for the novice database designer. Just a thought... Ries van Twisk > -----Oorspronkelijk bericht----- > Van: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]Namens dev@archonet.com > Verzonden: woensdag 15 januari 2003 18:23 > Aan: pgsql-sql@postgresql.org > Onderwerp: [SQL] RFC: A brief guide to nulls > > > There have been a few posts recently where people have had > problems with > nulls. Anyone got comments on the below before I submit it to > techdocs? > > TIA > > - Richard Huxton > > A Brief Guide to NULLs > ====================== > > What is a null? > =============== > A null is *not* an empty string. > A null is *not* a value. > A null is *not* a "special" value. > A null is the absence of a value. > > > What do nulls mean? > =================== > Well, they *should* mean one of two things: > 1. There is no applicable value > 2. There is a value but it is unknown > > Example 1: Imagine you have a customer table with name and sex fields. > If you get a new customer "ACME Widgets Ltd", the sex field > is meaningless > since your customer is a company (case 1). > If you get a new customer "Jackie Smith" they might be male > or female, but > you might not know (case 2). > > Example 2: You have an address table with > (street,city,county,postalcode) > fields. > You might insert an address ("10 Downing > Street","London",Null,"WC1 1AA") > since you don't have a valid county. > You might also insert an address ("1 Any > Street","Maidstone","Kent",Null) > where there *must be* a valid postalcode, but you don't know > what it is. > > It might be useful to be able to distinguish between these > two cases - not > applicable and unknown, but there is only one option "Null" > available to > us, so we can't. > > > How do nulls work? > ================== > There is one very important rule when dealing with nulls. The > result of > any operation or comparison, when applied to a null is null. The only > exception is testing if a value is null. > > Example: with the customer table above you could run the > following queries: > SELECT * FROM customer WHERE sex='M'; > SELECT * FROM customer WHERE sex<>'M'; > Now you might think this returns all customers, but it will miss those > where sex is null. You've asked for all rows where the value > of sex is 'M' > and all those with values not equal to 'M' but not rows with > *no value at > all* > > It might help to think of a database as a set of statements > you *know* to > be true. A null indicates that you *cannot say anything at > all* about that > field. You can't say what it is, you can't say what it isn't, > you can only > say there is some information missing. > > So, to see all the customers with unknown or inapplicable sex > you would need: > SELECT * FROM customer WHERE sex IS NULL; > > There are actually three possible results for a test in SQL - > True (the > test passed), False (the test failed) and Null (you tested against a > null). A result of null usually gets treated as False, so > testing against > nulls always fails. > > If you try to perform an operation on nulls, again the result > is always > null. So the results of all of the following are null: > SELECT 'abc' || null; > SELECT 1 + null; > SELECT sqrt(null::numeric); > The first case can be especially confusing. Concatenating a > null string to > a string value will return null, not the original value. > > > Uniqueness and nulls > ==================== > If you define a unique index on a column it prevents you inserting two > values that are the same. It does not prevent you inserting > as many nulls > as you like. How could it, you don't have a value so it can't > be the same > as any other. > > Example: We create a table "ta" with a unique constraint on column "b" > CREATE TABLE ta ( > a int4, > b varchar(3), > PRIMARY KEY (a) > ); > CREATE UNIQUE INDEX ta_b_idx ON ta (b); > INSERT INTO ta VALUES (1,'aaa'); -- succeeds > INSERT INTO ta VALUES (2,'bbb'); -- succeeds > INSERT INTO ta VALUES (3,null); -- succeeds > INSERT INTO ta VALUES (4,'bbb'); -- fails > INSERT INTO ta VALUES (5,null); -- succeeds! > > Given the definition of what a null is, you only have two > choices: allow > multiple nulls or allow no nulls. If you want no nulls, > define the column > as NOT NULL when creating the table. > > > Keys and nulls > ============== > No column that is part of a primary key can be null. When you define a > PRIMARY KEY, none of the columns mentioned can take a null value. > Postgresql makes sure of this by defining the columns as NOT > NULL for you. > > Example: With table "ta" we just created, \d ta will show column a as > being not null. Otherwise, we could insert rows with a set to null and > have no way to tell them apart. > > > Subqueries and nulls > ==================== > Since tests always fail when testing against nulls you can > have unexpected > results with sub-queries. > > Example: Assume we have a companies table and a diary table. > Diary entries > are usually related to a particular company but not always. > SELECT co_name FROM companies WHERE co_id NOT IN (SELECT > dy_company FROM > diary); > If any row in diary contains a null dy_company then you will get *no > results*. > We can expand the query like so: > WHERE co_id NOT IN (SELECT dy_company FROM diary) > WHERE co_id NOT IN (1, 2, null, 3...) > WHERE NOT (co_id=1 OR co_id=2 OR co_id=null OR co_id=3...) > WHERE NOT (... OR null OR ...) > WHERE NOT (null) > WHERE null > > You either need to explicitly check for null values, or > define the column > in question as NOT NULL. > > > Further Information > =================== > Any good relational database book (try something written by Date) > Bruce's book (link LHS at http://techdocs.postgresql.org) > My Postgresql Notes (link at http://techdocs.postgresql.org) > > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
I think that having this topic defined and available will be of great benefit... !!! Thanks Richard. Some additional thoughts based upon what other people have explicitly or implicitly implied. Peter quoted the sql definition as: >Every data type includes a special value, called the null value, and my earlier post implied that, but... I believe that one needs to be VERY explicit in the distinction between sementic meaning of some field and the values accessable. Using the example proposed in earlier posts, the storing of sex field for a customer. The example query was: > SELECT * FROM customer WHERE sex='M'; > SELECT * FROM customer WHERE sex<>'M'; One person suggested: >For example sex could be classified as > 'n' - not applicable > 'f' - female > 'm' - male > null - yet unknown >... >if customers sex is unknown - null, we can't decide >whether they're men or not. Which is straying way into the semantic information of the field. In reality we have: +Fieldname(sex)------------------+ |..............|value|{somevalue}| |..............|nullP|{1or0} | +--------------------------------+ Every field has a value portion and a nullP portion. Here I use nullP, which is either 1 or 0, indicating that the field is NULL or NOTNULL. NOTNULL means that the value field has a value. NULL means that the value field does not have a value. Period. Anything else begins to stray into the semantic range. Using the above... > 'n' - not applicable > 'f' - female > 'm' - male > null - yet unknown which we could code as.. .....................................meaning +Fieldname(sex)------------------+ |..............|value|{n} | not applicable |..............|nullP|{0} | +--------------------------------+ +Fieldname(sex)------------------+ |..............|value|{f} | female |..............|nullP|{0} | +--------------------------------+ +Fieldname(sex)------------------+ |..............|value|{m} | male |..............|nullP|{0} | +--------------------------------+ +Fieldname(sex)------------------+ |..............|value|{} | |..............|nullP|{1} | <no value> +--------------------------------+ If I was a clerk, looking at a person and could not tell their sex due to appearance, etc, that should probably be coded: 'u'-unknown The "concept" of null meaning unknown is SEMANTIC... in the case of sex. The nullP, ie testing via NULL/NOTNULL, means that one can either test or (should/can) access the value field. No other meaning should be implied. The application on up the ladder implies the meaning behind not having a value. The point I'm trying to make here is that one should not intermix the application meaning of having/not-having a value with the value itself. Especially since you can have a NOTNULL field that has a text value having no bytes in it, i.e. a string field that has no characters... Humm... don't know if I made my point clear as mud... Best regards, .. Otto