Re: NULL values or not? - Mailing list pgsql-sql
From | Archibald Zimonyi |
---|---|
Subject | Re: NULL values or not? |
Date | |
Msg-id | Pine.LNX.4.21.0112211807470.21359-100000@valdez.netg.se Whole thread Raw |
In response to | Re: NULL values or not? ("Josh Berkus" <josh@agliodbs.com>) |
Responses |
Re: NULL values or not?
|
List | pgsql-sql |
Thank you, thank you, thank you, thank you.... Your post, Josh, was very informative for me. As it is now, my Vampire schema, which I forgot to add as an attachment, doesn't allow a single NULL value. I do have a small amount of data, which is why things might still work as fast as they do. The start of my problems is still the fact that when I do VACUUM ANALYZE my main query takes almost a minute (from around a second), which has made me perlexed by the whole thing, since the EXPLAIN shows relatively low values (although reading that is still an art, I have a query which results in more rows and has higher EXPLAIN values but still takes less time then my main query). I have read a book which I find immensly useful and good, called: The Practical SQL Handbook published by Addison-Wesley. It is this book that has given me my main view of SQL and database design. I think my database uses a lot of the normalization rules which is why I try to avoid NULL values. An example from my own Vampire database follows: CREATE TABLE cards ( card_id int2 DEFAULT NEXTVAL('cards_seq'), cardname text NOT NULL, PRIMARY KEY(card_id) ); CREATE TABLE cards_names ( card_id int2 NOT NULL, cardname text NOT NULL, FOREIGN KEY (card_id) REFERENCES cards ); The table cards include the primary names of all cards. Each card is unique, giving it a unique id. Some cards have two names, due to the fact that there were misprints etc. So the table cards_names includes those few cards which might have two names. CREATE TABLE cards_costs ( card_id int2 NOT NULL, cost int2 NOT NULL, pool bool NOT NULL, FOREIGN KEY (card_id) REFERENCES cards ); Some cards, but not all, have a certain cost to play. Instead of having a "cost" column in "cards" and then having NULL values where the cards have no cost, I once again decided to create a new table connecting the cost to a card with it's id. So cards_costs also includes less cards then do the table cards. It is this I based my questions on, I know why I do it the way I do, but I would also like to hear what other people think of this solution, and perhaps why I should use NULLS instead. As I said in an earlier post, there is a cost of joining the tables, which is higher then having one table with NULLS. Once again, thanks for the replies folks, Archie On Fri, 21 Dec 2001, Josh Berkus wrote: > Archibald, > > > So again, I have no problems with NULLS, I just would like someone to > > share when they use them contra when they don't. > > Thank you for bringing a theory question onto the list! We spend much > of our time on pgsql-sql focused on practical minutia. Thus several of > the respondants who siezed on your hypothetical example rather than > answering the question. I'd love to see some of the more experienced DB > theorists weigh in on this issue (Tom? Joe C.?). > > Relationally speaking, NULLs are a necessary evil. You are right to > approach them with caution. I cannot count the number of times one of > my database rescue attempts has involved eliminating, sub-tabling, or > consolidating NULLable columns. > > By coincidence, yesterday I posted this to PGSQL-NOVICE: > > ============================================================ > > 3) Use of NULL > > > > It would facilitate entering data into TABLE clone if seq, qual... > > were defined as NULL even though values for these columns will/does > > exist. If I defined these columns as NULL I would not have to have > > ALL of the data together at one time, but could enter it in steps. > > Is this an ill conceived notion? > > Well, this purpose is what NULL is for. Cases where data is unknown or > not yet available. That being said, allowing NULLs takes data integrity > out of the table design and moves it elsewhere in the software. If you > allow NULLs for seq, then you will need to create a data integrity > report that searches for old entries with NULL in the seq column. > Otherwise, you risk having some required data never filled in. > > Also, remember that you can't JOIN on a NULL value. For example, if you > allow NULLs in, say clones.gb_id, then if you do a report on clones > JOIN > gb_accessions, the clones who are missing GB will not show up with a > blank GB, instead they will not show up at all! You can get around this > with OUTER JOINS, but OUTER JOINs are not dependable for multi-column > joins. > > Instead, I recommend that everywhere it is possible, you have an actual > value that indicates why the data has not been filled in. FOr example, > you could create a gb_accession with the ID of zero (0) (and accn_no, > gi_no of 0 as well) which would indicate (to you) "gb not run yet". > This gives you more information than NULL (which could indicate a > number > of things: GB not run, GB lost, data error, program error, etc.), as > well as supporting JOINs cleanly. You could even have more than one > such value to indicate different reasons for missing info. > ===================================================== > > For a more in-depth discussion of NULLs and their problems and > workarounds, see Fabian Pascal's "Practical Issues in Database > Management", which has the better part of a chapter on the topic. > > -Josh > > ______AGLIO DATABASE SOLUTIONS___________________________ > Josh Berkus > Complete information technology josh@agliodbs.com > and data management solutions (415) 565-7293 > for law firms, small businesses fax 621-2533 > and non-profit organizations. San Francisco >