Thread: Database Design Question
A Question for those of you who consider yourself crack Database Designers. I am currently moving a large database(100+Tables) into pgSQL... with the intention of deploying against 'any' SQL databasein the future. The development side will be rigorously using Standard SQL constructs with no unique/proprietary extensions. My question concerns establishing the relationships. Currently Relationships between tables are established via a Unique Integer ID like this: *=APrimaryKey PartTypes Customer Parts--------- -------- -----PartTypeID CustomerID PartID *PartType *Customer PartTypeID Address CustomerID *PartNumber(2FieldPrimaryKey) *PartRevision(2FieldPrimaryKey) PartName HOWEVER; I have read lots of texts describing the Relational Design should be instead like this: *=APrimaryKey PartTypes Customer Parts--------- -------- ----- *PartType *Customer PartType Address *PartNumber(2FieldPrimaryKey) *PartRevison(2FieldPrimaryKey) PartName Customer Both Techniques have a unique foreign key back to the parent tables but one uses No.Meaningful.Info.Integer.Data for theForeignKey while the second uses Human.Understandable.ForeignKeys Is one recommended over the other??? Sure appreciate the commentary before I get in too deep with all these tables. Thanks!
I'm by no means a crack database designer, but I do have my ideas on this subject. I prefer that every table has a unique integer (auto-incrementing) primary key. Why? Consistency. With the alternative, some tables may not have a clear-cut candidate for a unique id. In your example, you had "customer". How many "John Smith"s would it take before we decide that is not a good identifier. On the other hand, some tables would have perfectly logical identifiers. Part numbers, SSNs.... So, you would need to create some tables with integer primary keys, and others would have some other natural identifier. That to me is inconsistent. Every table should be, IMHO, predictable in it's definition of a primary key. I don't even have to guess what the names of my primary keys are either because the are all named <tablename>_ID. Always. I've only come up with these thoughts on my own, and have not extensively tried the other way, so I'd be interested in hearing other's ideas for the other side. Thanks, Jimmie Fulton Systems Administrator Environmental Health & Safety Office Emory University School Of Medicine -----Original Message----- From: Gonzo Rock [mailto:GonzoRock@Excite.com] Sent: Friday, July 27, 2001 2:03 PM To: pgsql-sql@postgresql.org Subject: [SQL] Database Design Question A Question for those of you who consider yourself crack Database Designers. I am currently moving a large database(100+Tables) into pgSQL... with the intention of deploying against 'any' SQL database in the future. The development side will be rigorously using Standard SQL constructs with no unique/proprietary extensions. My question concerns establishing the relationships. Currently Relationships between tables are established via a Unique Integer ID like this: *=APrimaryKey PartTypes Customer Parts--------- -------- -----PartTypeID CustomerID PartID *PartType *Customer PartTypeID Address CustomerID *PartNumber(2FieldPrimaryKey) *PartRevision(2FieldPrimaryKey) PartName HOWEVER; I have read lots of texts describing the Relational Design should be instead like this: *=APrimaryKey PartTypes Customer Parts--------- -------- ----- *PartType *Customer PartType Address *PartNumber(2FieldPrimaryKey) *PartRevison(2FieldPrimaryKey) PartName Customer Both Techniques have a unique foreign key back to the parent tables but one uses No.Meaningful.Info.Integer.Data for the ForeignKey while the second uses Human.Understandable.ForeignKeys Is one recommended over the other??? Sure appreciate the commentary before I get in too deep with all these tables. Thanks! ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
OK... Fair Enough... Good Points indeed y'all. Well... What about the problem of users trying to Query the Database?? You know... like when using Crystal Reports or something?. SELECT * from HistoryTable WHERE PartID = SomeInteger vs SELECT * from HistoryTable WHERE PartNum = 12345636 AND PartRev = C How are they supposed to know What the PartID is ?? Anyway, that I why I was considering changing... current users always have trouble peering into the database... They don'tquite get it. At 02:31 PM 7/27/01 -0400, Mike Mascari wrote: >I prefer using unique integer ids generated from sequences rather than >keys composed of meaningful values. > >Advantages: > >Client side applications can store/handle the unique integer ids more >readily than having to deal with composite primary keys composed of >varying data types. For example, I can stuff the id associated with a >particular record easily in list boxes, combo boxes, edit controls, etc. >via SetItemData() or some other appropriate method. Its a bit more >complicated to track database records via composite keys of something >like: part no, vendor no, vendor group. > >Updating the data doesn't require cascading updates. If you use keys >with meaning, the referential integrity constraints must support >cascading updates so if the key changes in the primary table the change >is cascaded to all referencing tables as well. Earlier versions of most >databases (Access, Oracle, etc.) only provided cascading deletes under >the assumption you would be using sequence generated keys. > >Downside: > >Many queries might require more joins against the primary table to fetch >the relevant information associated with the numerical id, whereas keys >composed of solely the values with which they are associated might not >require the joins, which will speed some applications. I now have some >queries with 20-way joins. But PostgreSQL provides a way to explicitly >set the path the planner will choose and so the execution of the query >is instantaneous. I'm not sure about other databases. In earlier >versions, I had to denormalize a bit solely for performance reasons. > >In the past, I used to use composite keys and switched to the purely >sequence generated path and don't regret it at all. Of course, you'll >still have a unique constraint on the what-would-have-been meaningful >primary key. > >Hope that helps, > >Mike Mascari >mascarm@mascari.com > >Gonzo Rock wrote: >> >> A Question for those of you who consider yourself crack Database Designers. >> >> I am currently moving a large database(100+Tables) into pgSQL... with the intention of deploying against 'any' SQL databasein the future. The development side will be rigorously using Standard SQL constructs with no unique/proprietary extensions. >> >> My question concerns establishing the relationships. >> >> Currently Relationships between tables are established via a Unique Integer ID like this: >> >> *=APrimaryKey >> >> PartTypes Customer Parts >> --------- -------- ----- >> PartTypeID CustomerID PartID >> *PartType *Customer PartTypeID >> Address CustomerID >> *PartNumber(2FieldPrimaryKey) >> *PartRevision(2FieldPrimaryKey) >> PartName >> >> >> HOWEVER; I have read lots of texts describing the Relational Design should be instead like this: >> >> *=APrimaryKey >> >> PartTypes Customer Parts >> --------- -------- ----- >> *PartType *Customer PartType >> Address *PartNumber(2FieldPrimaryKey) >> *PartRevison(2FieldPrimaryKey) >> PartName >> Customer >> >> Both Techniques have a unique foreign key back to the parent tables but one uses No.Meaningful.Info.Integer.Data for theForeignKey while the second uses Human.Understandable.ForeignKeys >> >> Is one recommended over the other??? Sure appreciate the commentary before I get in too deep with all these tables. >> >> Thanks! >
----- Original Message ----- From: "Gonzo Rock" <GonzoRock@Excite.com> To: <pgsql-general@postgresql.org> Cc: <pgsql-sql@postgresql.org> Sent: Friday, July 27, 2001 4:21 PM Subject: RE: [SQL] Database Design Question > OK... Fair Enough... Good Points indeed y'all. > > Well... What about the problem of users trying to Query the Database?? > > You know... like when using Crystal Reports or something?. > > SELECT * from HistoryTable > WHERE PartID = SomeInteger > > vs > > SELECT * from HistoryTable > WHERE PartNum = 12345636 AND PartRev = C > > How are they supposed to know What the PartID is ?? > > Anyway, that I why I was considering changing... current users always have trouble peering into the database... They don't quite get it. Search conditions don't HAVE to be indexes. And you can have more than one index. So you could have your primary index on PartID, which would be used by your applications and another index on PartNum and PartRev if those are frequently searched fields for crystal reports etc.
You would not necessarily want clients searching by the integer-based key. That is used internally by the database or application. Clients wouldn't know what the keys are anyway because they are not used for anything except referential integrity/relation of data. You would have Part_ID, and Part_Number. Part_ID is defining the relationship and Part_Number is human readable for searching. If you don't want your clients to see the Part_ID, create a view that excludes it and make them use the views for queries. As stated already by James, you can still make Part_Number unique even if a primary key already exists. Hope this makes sense/is helpful, Jimmie Fulton Systems Administrator Environmental Health & Safety Office Emory University School Of Medicine -----Original Message----- From: Gonzo Rock [mailto:GonzoRock@Excite.com] Sent: Friday, July 27, 2001 4:22 PM To: pgsql-general@postgresql.org Cc: pgsql-sql@postgresql.org Subject: RE: [SQL] Database Design Question OK... Fair Enough... Good Points indeed y'all. Well... What about the problem of users trying to Query the Database?? You know... like when using Crystal Reports or something?. SELECT * from HistoryTable WHERE PartID = SomeInteger vs SELECT * from HistoryTable WHERE PartNum = 12345636 AND PartRev = C How are they supposed to know What the PartID is ?? Anyway, that I why I was considering changing... current users always have trouble peering into the database... They don't quite get it. At 02:31 PM 7/27/01 -0400, Mike Mascari wrote: >I prefer using unique integer ids generated from sequences rather than >keys composed of meaningful values. > >Advantages: > >Client side applications can store/handle the unique integer ids more >readily than having to deal with composite primary keys composed of >varying data types. For example, I can stuff the id associated with a >particular record easily in list boxes, combo boxes, edit controls, etc. >via SetItemData() or some other appropriate method. Its a bit more >complicated to track database records via composite keys of something >like: part no, vendor no, vendor group. > >Updating the data doesn't require cascading updates. If you use keys >with meaning, the referential integrity constraints must support >cascading updates so if the key changes in the primary table the change >is cascaded to all referencing tables as well. Earlier versions of most >databases (Access, Oracle, etc.) only provided cascading deletes under >the assumption you would be using sequence generated keys. > >Downside: > >Many queries might require more joins against the primary table to fetch >the relevant information associated with the numerical id, whereas keys >composed of solely the values with which they are associated might not >require the joins, which will speed some applications. I now have some >queries with 20-way joins. But PostgreSQL provides a way to explicitly >set the path the planner will choose and so the execution of the query >is instantaneous. I'm not sure about other databases. In earlier >versions, I had to denormalize a bit solely for performance reasons. > >In the past, I used to use composite keys and switched to the purely >sequence generated path and don't regret it at all. Of course, you'll >still have a unique constraint on the what-would-have-been meaningful >primary key. > >Hope that helps, > >Mike Mascari >mascarm@mascari.com > >Gonzo Rock wrote: >> >> A Question for those of you who consider yourself crack Database Designers. >> >> I am currently moving a large database(100+Tables) into pgSQL... with the intention of deploying against 'any' SQL database in the future. The development side will be rigorously using Standard SQL constructs with no unique/proprietary extensions. >> >> My question concerns establishing the relationships. >> >> Currently Relationships between tables are established via a Unique Integer ID like this: >> >> *=APrimaryKey >> >> PartTypes Customer Parts >> --------- -------- ----- >> PartTypeID CustomerID PartID >> *PartType *Customer PartTypeID >> Address CustomerID >> *PartNumber(2FieldPrimaryKey) >> *PartRevision(2FieldPrimaryKey) >> PartName >> >> >> HOWEVER; I have read lots of texts describing the Relational Design should be instead like this: >> >> *=APrimaryKey >> >> PartTypes Customer Parts >> --------- -------- ----- >> *PartType *Customer PartType >> Address *PartNumber(2FieldPrimaryKey) >> *PartRevison(2FieldPrimaryKey) >> PartName >> Customer >> >> Both Techniques have a unique foreign key back to the parent tables but one uses No.Meaningful.Info.Integer.Data for the ForeignKey while the second uses Human.Understandable.ForeignKeys >> >> Is one recommended over the other??? Sure appreciate the commentary before I get in too deep with all these tables. >> >> Thanks! > ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
On Fri, 27 Jul 2001, Jimmie Fulton wrote: > I'm by no means a crack database designer, but I do have my ideas on this > subject. I prefer that every table has a unique integer (auto-incrementing) > primary key. Why? Consistency. With the alternative, some tables may not > have a clear-cut candidate for a unique id. In your example, you had > "customer". How many "John Smith"s would it take before we decide that is > not a good identifier. On the other hand, some tables would have perfectly > logical identifiers. Part numbers, SSNs.... So, you would need to create > some tables with integer primary keys, and others would have some other > natural identifier. That to me is inconsistent. Every table should be, > IMHO, predictable in it's definition of a primary key. I don't even have to > guess what the names of my primary keys are either because the are all named > <tablename>_ID. Always. I've only come up with these thoughts on my own, > and have not extensively tried the other way, so I'd be interested in > hearing other's ideas for the other side. For large scale databases, there are theories aplenty about proper naming, etc., and these should be be investigated. For small/simple databases, this might be overkill in complexity and learning curve. I teach a series of classes on small-scale database design for nonprofit organizations, and recommend some simple rules: * for tables where there is no candidate key (ie a Person table where there is no SSN, etc.), use the table name + "id", and use a SERIAL-type. * for tables where there is a candidate key, and that candidate key meets all the usual criteria (always present, always unique, changes very rarely), use the table name + "code", and use the appropriate type (text/int/whatever), UNLESS * there exists a very common name for this piece of info. Rather than calling a SSN a "personcode" (in the above example), call it the SSN * always put the primary key first in the table Why not always use a SERIAL integer? It's a pain when a perfectly good value exists. For example, in a table keeping track of US states, their capitals, and governors, does it really make sense to code Maryland as "45", when a memorable, unique, unlikely-to-change code already exists ("md")? Using a random number when a real-world code could do only forces your user to do that lookup themselves. [apologies to the international readers: Maryland is a state in the USA, and "MD" is the postal code abbreviation for it] I think that you could make some basic rules that would give you a system that is intuitive and easy-to-rememeber, _without_ sacrificing clarity for rigidity. hth, -- Joel Burton <jburton@scw.org> Director of Information Systems, Support Center of Washington