Re: NULL values or not? - Mailing list pgsql-sql

From Archibald Zimonyi
Subject Re: NULL values or not?
Date
Msg-id Pine.LNX.4.21.0112211807470.21359-100000@valdez.netg.se
Whole thread Raw
In response to Re: NULL values or not?  ("Josh Berkus" <josh@agliodbs.com>)
Responses Re: NULL values or not?
List pgsql-sql

Thank you, thank you, thank you, thank you....

Your post, Josh, was very informative for me. As it is now, my Vampire
schema, which I forgot to add as an attachment, doesn't allow a single
NULL value. I do have a small amount of data, which is why things might
still work as fast as they do.

The start of my problems is still the fact that when I do VACUUM ANALYZE
my main query takes almost a minute (from around a second), which has made
me perlexed by the whole thing, since the EXPLAIN shows relatively low
values (although reading that is still an art, I have a query which
results in more rows and has higher EXPLAIN values but still takes less
time then my main query).

I have read a book which I find immensly useful and good, called:

The Practical SQL Handbook 

published by Addison-Wesley. It is this book that has given me my main
view of SQL and database design. I think my database uses a lot of the
normalization rules which is why I try to avoid NULL values. An example
from my own Vampire database follows:

CREATE TABLE cards
( card_id int2 DEFAULT NEXTVAL('cards_seq'), cardname text NOT NULL, PRIMARY KEY(card_id)
);

CREATE TABLE cards_names
( card_id int2 NOT NULL, cardname text NOT NULL, FOREIGN KEY (card_id) REFERENCES cards
);


The table cards include the primary names of all cards. Each card is
unique, giving it a unique id. Some cards have two names, due to the fact
that there were misprints etc.

So the table cards_names includes those few cards which might have two
names.

CREATE TABLE cards_costs
( card_id int2 NOT NULL, cost int2 NOT NULL, pool bool NOT NULL, FOREIGN KEY (card_id) REFERENCES cards
);

Some cards, but not all, have a certain cost to play. Instead of having a
"cost" column in "cards" and then having NULL values where the cards have
no cost, I once again decided to create a new table connecting the cost to
a card with it's id. So cards_costs also includes less cards then do the
table cards.

It is this I based my questions on, I know why I do it the way I do, but I
would also like to hear what other people think of this solution, and
perhaps why I should use NULLS instead. As I said in an earlier post,
there is a cost of joining the tables, which is higher then having one
table with NULLS.

Once again, thanks for the replies folks,

Archie

On Fri, 21 Dec 2001, Josh Berkus wrote:

> Archibald,  
> 
> > So again, I have no problems with NULLS, I just would like someone to
> > share when they use them contra when they don't.
> 
> Thank you for bringing a theory question onto the list!  We spend much
> of our time on pgsql-sql focused on practical minutia.  Thus several of
> the respondants who siezed on your hypothetical example rather than
> answering the question.  I'd love to see some of the more experienced DB
> theorists weigh in on this issue (Tom?  Joe C.?).
> 
> Relationally speaking, NULLs are a necessary evil.  You are right to
> approach them with caution.  I cannot count the number of times one of
> my database rescue attempts has involved eliminating, sub-tabling, or
> consolidating NULLable columns.  
> 
> By coincidence, yesterday I posted this to PGSQL-NOVICE:
> 
> ============================================================
> > 3) Use of NULL
>  > 
> > It would facilitate entering data into TABLE clone if seq, qual... 
> > were defined as NULL even though values for these columns will/does 
> > exist. If I defined these columns as NULL I would not have to have 
> > ALL of the data together at one time, but could enter it in steps. 
> > Is this an ill conceived notion?
>  
> Well, this purpose is what NULL is for. Cases where data is unknown or
>  not yet available. That being said, allowing NULLs takes data integrity
>  out of the table design and moves it elsewhere in the software. If you
>  allow NULLs for seq, then you will need to create a data integrity
>  report that searches for old entries with NULL in the seq column.
>  Otherwise, you risk having some required data never filled in.
>  
> Also, remember that you can't JOIN on a NULL value. For example, if you
>  allow NULLs in, say clones.gb_id, then if you do a report on clones
> JOIN
>  gb_accessions, the clones who are missing GB will not show up with a
>  blank GB, instead they will not show up at all! You can get around this
>  with OUTER JOINS, but OUTER JOINs are not dependable for multi-column
>  joins. 
> 
>  Instead, I recommend that everywhere it is possible, you have an actual
>  value that indicates why the data has not been filled in. FOr example,
>  you could create a gb_accession with the ID of zero (0) (and accn_no,
>  gi_no of 0 as well) which would indicate (to you) "gb not run yet".
>  This gives you more information than NULL (which could indicate a
> number
>  of things: GB not run, GB lost, data error, program error, etc.), as
>  well as supporting JOINs cleanly. You could even have more than one
>  such value to indicate different reasons for missing info.
> =====================================================
> 
> For a more in-depth discussion of NULLs and their problems and
> workarounds, see Fabian Pascal's "Practical Issues in Database
> Management", which has the better part of a chapter on the topic.
> 
> -Josh
> 
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                        Josh Berkus
>   Complete information technology      josh@agliodbs.com
>    and data management solutions       (415) 565-7293
>   for law firms, small businesses        fax 621-2533
>     and non-profit organizations.      San Francisco
> 



pgsql-sql by date:

Previous
From: "Josh Berkus"
Date:
Subject: Re: NULL values or not?
Next
From: "Josh Berkus"
Date:
Subject: Re: NULL values or not?