Re: Database migration and redesign - Mailing list pgsql-general
From | Brandon Aiken |
---|---|
Subject | Re: Database migration and redesign |
Date | |
Msg-id | F8E84F0F56445B4CB39E019EF67DACBA2F1D3E@exchsrvr.winemantech.com Whole thread Raw |
In response to | Re: Database migration and redesign ("Merlin Moncure" <mmoncure@gmail.com>) |
List | pgsql-general |
Excellent, I managed to find one of the major sticking points all by myself! This is exactly what I was looking for. Thanks! There seems to be a lot of that in the DB world. Practical vs theoretical. Or pragmatic vs strict. It seems to be whether you came from a math background -- in which case you're concerned with logical data sets in the most effective theory possible -- or from a software engineering background -- in which case you're concerned about the usability and performance of the database software, particularly in how other applications reference and access the database. There are advantages to both schools. A highly theoretical database can also be highly normalized and therefore have very agile data models built on top of them. However, the practical limitation of computerized relational databases means that they can't implement all features of the relational database model very well or very efficiently. Additionally, while the logical models built on top of the theoretical DB structure can be very agile, designing those models is very complex due to the complex nature of the highly normalized relational structure involved. If you have to do a multiple nested join of 10 tables just to run what will be a query of the most basic object in the final app, you've probably normalized way too far. I think I can draw some lines of distinction now, though. There are some places where the index is just a means to relate the otherwise unrelated tables (if that makes sense), and sometimes where using natural keys will save several table joins on many queries. This was what I was seeing, and I was wondering if it was acceptable to mix it up and do both where it seems to make the most sense. My next task is a bit more difficult simply to define what the relationship needs to be. The app is business management software that manages, among other things, Jobs and Quotes. The relationship between Jobs and Quotes is a little odd. Quotes are sent to customers, and if those customers accept the Quote then a Job is created and the Quote is assigned to it. So Quotes exist without Jobs. Additionally, customers will often request additional services after the original Quote and these services get Quoted in the same way, so multiple Quotes can be assigned to a single Job. It gets worse. We use Internal Jobs for cost tracking, and these Jobs do not have Quotes at all. Also, there are some Jobs that get 'verbal' quotes (that is, quotes given outside the normal quoting system). So Jobs exist without Quotes. So, one Job can be associated with many Quotes, so Quote is the child of the relationship even though it gets created first. There can be 0 or 1 parents, and 0, 1, or many children in any combination. I can simply define foreign keys normally and use NULL values where no relationship exists, but isn't there a better way to do this? Allowing NULLs is one of the problems with many of these tables. Should I create another table with two fields, one for the Quote number and one for the Job number, and then have those two fields comprise a compound primary key (and Quote having a unique constraint for itself as well)? That way I don't think I can get logically orphaned children since both tables are parents to this third table. Is that how it's supposed to be done? There are several places across the DB where this style relationship occurs, and I'd like to try to conform to best practices (especially since the last guy managed to miss just about every single one). -- Brandon Aiken CS/IT Systems Engineer -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: Monday, September 11, 2006 1:08 PM To: Brandon Aiken Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Database migration and redesign On 9/11/06, Brandon Aiken <BAiken@winemantech.com> wrote: > My question relates to primary keys. The vast majority of tables have a > primary key on a single char or varchar field. Is it considered better > practice to create a serial type id key to use as the primary key for the > table, and then create a unique index on the char and varchar fields? > Should foreign keys reference the new primary id or the old unique key? > What about compound [primary] keys? This is a somewhat controversial topic. I personally am not a big fan of Ambler or his approach to database design. This is more or less the surrogate/natural key debate which is a perennial flamewar in the database world. I tend to think natural keys are generally superior but surrogates are useful in some situations. Automatic use of ID column in every table leads to lazy thinking and overcomplicated designs. However at times they can really optimize your database. So I will lay down some extremes and leave it up to you to find the middle ground. A trivial case of when not to use a serial key is like this: create table sex(id serial, sex char(1)); insert into sex(sex) select 'M' union all select 'F'; The reason is obvious, adding a join for no reason whenever you need to know the sex, albeit a simple one, and expanding the 1 character type to an int type. A somewhat more interesting case is: create table email(id serial, email text); In this case, while it may seem like a waste to store the full email in every table that references the email, you are optimizing the join out in such cases, which can be a big win and since there is no other properties of the email the email table only serves the purpose of maintaining relational integrity, iow no duplicates. In the event the email changes, we allow RI to cleanup the other tables...an integer proxy would be (at least to me) an meaningless abstraction of the email. A case of when not to use a natural key for relating is a bit more complex, some times you just get sick and tired of writing the key fields over and over, or you have measured and determined the natural key to be wasetful in terms of index performance. Another reason is if the p-key data changes frequently and the RI mechism is too expensive. In this case I would advise you to strictly key on the natural and make a candidate, serial key. create table foo ( nat_key1 text, nat_key2 int, foo_id serial, [...] primary key(nat_key1, nat_key2), unique(foo_id) -- candidate unique key ); This is my middle ground: there highly situational cases where an id column is a win on pracital reasons. however, a strong design around natural keys tends to make you think the problem through much more carefully and lead to a tighter database. > Also, any suggestions for good DB design books would be appreciated. I no > longer have any good DB design books, and I don't know what's good anymore. http://www.amazon.com/exec/obidos/ASIN/0201485559/databasede095-20?creat ive=327641&camp=14573&adid=07TEH0J3FS9SYN309QMS&link_code=as1 merlin
pgsql-general by date: