Thread: Database Design Question

Database Design Question

From
Gonzo Rock
Date:
A Question for those of you who consider yourself crack Database Designers.

I am currently moving a large database(100+Tables) into pgSQL... with the intention of deploying against 'any' SQL
databasein the future. The development side will be rigorously using Standard SQL constructs with no unique/proprietary
extensions.

My question concerns establishing the relationships.

Currently Relationships between tables are established via a Unique Integer ID like this:

*=APrimaryKey
PartTypes    Customer     Parts---------    --------     -----PartTypeID   CustomerID   PartID
*PartType    *Customer     PartTypeID             Address      CustomerID
*PartNumber(2FieldPrimaryKey)                        *PartRevision(2FieldPrimaryKey)                          PartName
                  
 

HOWEVER; I have read lots of texts describing the Relational Design should be instead like this:

*=APrimaryKey
PartTypes    Customer     Parts---------    --------     -----
*PartType    *Customer     PartType               Address     *PartNumber(2FieldPrimaryKey)
*PartRevison(2FieldPrimaryKey)                         PartName                              Customer
                     
 
Both Techniques have a unique foreign key back to the parent tables but one uses No.Meaningful.Info.Integer.Data for
theForeignKey while the second uses Human.Understandable.ForeignKeys
 

Is one recommended over the other??? Sure appreciate the commentary before I get in too deep with all these tables.

Thanks!




RE: Database Design Question

From
Jimmie Fulton
Date:
I'm by no means a crack database designer, but I do have my ideas on this
subject.  I prefer that every table has a unique integer (auto-incrementing)
primary key.  Why?  Consistency.  With the alternative, some tables may not
have a clear-cut candidate for a unique id.  In your example, you had
"customer".  How many "John Smith"s would it take before we decide that is
not a good identifier.  On the other hand, some tables would have perfectly
logical identifiers.  Part numbers, SSNs....  So, you would need to create
some tables with integer primary keys, and others would have some other
natural identifier.  That to me is inconsistent.  Every table should be,
IMHO, predictable in it's definition of a primary key.  I don't even have to
guess what the names of my primary keys are either because the are all named
<tablename>_ID.  Always.  I've only come up with these thoughts on my own,
and have not extensively tried the other way, so I'd be interested in
hearing other's ideas for the other side.

Thanks,

Jimmie Fulton
Systems Administrator
Environmental Health & Safety Office
Emory University School Of Medicine


-----Original Message-----
From: Gonzo Rock [mailto:GonzoRock@Excite.com]
Sent: Friday, July 27, 2001 2:03 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] Database Design Question


A Question for those of you who consider yourself crack Database Designers.

I am currently moving a large database(100+Tables) into pgSQL... with the
intention of deploying against 'any' SQL database in the future. The
development side will be rigorously using Standard SQL constructs with no
unique/proprietary extensions.

My question concerns establishing the relationships.

Currently Relationships between tables are established via a Unique Integer
ID like this:

*=APrimaryKey
PartTypes    Customer     Parts---------    --------     -----PartTypeID   CustomerID   PartID
*PartType    *Customer     PartTypeID             Address      CustomerID
*PartNumber(2FieldPrimaryKey)                        *PartRevision(2FieldPrimaryKey)                          PartName
                  
 

HOWEVER; I have read lots of texts describing the Relational Design should
be instead like this:

*=APrimaryKey
PartTypes    Customer     Parts---------    --------     -----
*PartType    *Customer     PartType               Address     *PartNumber(2FieldPrimaryKey)
*PartRevison(2FieldPrimaryKey)                         PartName                              Customer
                     
 
Both Techniques have a unique foreign key back to the parent tables but one
uses No.Meaningful.Info.Integer.Data for the ForeignKey while the second
uses Human.Understandable.ForeignKeys

Is one recommended over the other??? Sure appreciate the commentary before I
get in too deep with all these tables.

Thanks!



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly


RE: Database Design Question

From
Gonzo Rock
Date:
OK... Fair Enough... Good Points indeed y'all.

Well... What about the problem of users trying to Query the Database??

You know... like when using Crystal Reports or something?.

SELECT * from HistoryTable
WHERE PartID = SomeInteger

vs

SELECT * from HistoryTable
WHERE PartNum = 12345636 AND PartRev = C

How are they supposed to know What the PartID is ??

Anyway, that I why I was considering changing... current users always have trouble peering into the database... They
don'tquite get it. 




At 02:31 PM 7/27/01 -0400, Mike Mascari wrote:
>I prefer using unique integer ids generated from sequences rather than
>keys composed of meaningful values.
>
>Advantages:
>
>Client side applications can store/handle the unique integer ids more
>readily than having to deal with composite primary keys composed of
>varying data types. For example, I can stuff the id associated with a
>particular record easily in list boxes, combo boxes, edit controls, etc.
>via SetItemData() or some other appropriate method. Its a bit more
>complicated to track database records via composite keys of something
>like: part no, vendor no, vendor group.
>
>Updating the data doesn't require cascading updates. If you use keys
>with meaning, the referential integrity constraints must support
>cascading updates so if the key changes in the primary table the change
>is cascaded to all referencing tables as well. Earlier versions of most
>databases (Access, Oracle, etc.) only provided cascading deletes under
>the assumption you would be using sequence generated keys.
>
>Downside:
>
>Many queries might require more joins against the primary table to fetch
>the relevant information associated with the numerical id, whereas keys
>composed of solely the values with which they are associated might not
>require the joins, which will speed some applications. I now have some
>queries with 20-way joins. But PostgreSQL provides a way to explicitly
>set the path the planner will choose and so the execution of the query
>is instantaneous. I'm not sure about other databases. In earlier
>versions, I had to denormalize a bit solely for performance reasons.
>
>In the past, I used to use composite keys and switched to the purely
>sequence generated path and don't regret it at all. Of course, you'll
>still have a unique constraint on the what-would-have-been meaningful
>primary key.
>
>Hope that helps,
>
>Mike Mascari
>mascarm@mascari.com
>
>Gonzo Rock wrote:
>>
>> A Question for those of you who consider yourself crack Database Designers.
>>
>> I am currently moving a large database(100+Tables) into pgSQL... with the intention of deploying against 'any' SQL
databasein the future. The development side will be rigorously using Standard SQL constructs with no unique/proprietary
extensions.
>>
>> My question concerns establishing the relationships.
>>
>> Currently Relationships between tables are established via a Unique Integer ID like this:
>>
>> *=APrimaryKey
>>
>>  PartTypes    Customer     Parts
>>  ---------    --------     -----
>>  PartTypeID   CustomerID   PartID
>> *PartType    *Customer     PartTypeID
>>               Address      CustomerID
>>                           *PartNumber(2FieldPrimaryKey)
>>                           *PartRevision(2FieldPrimaryKey)
>>                            PartName
>>
>>
>> HOWEVER; I have read lots of texts describing the Relational Design should be instead like this:
>>
>> *=APrimaryKey
>>
>>  PartTypes    Customer     Parts
>>  ---------    --------     -----
>> *PartType    *Customer     PartType
>>               Address     *PartNumber(2FieldPrimaryKey)
>>                           *PartRevison(2FieldPrimaryKey)
>>                            PartName
>>                            Customer
>>
>> Both Techniques have a unique foreign key back to the parent tables but one uses No.Meaningful.Info.Integer.Data for
theForeignKey while the second uses Human.Understandable.ForeignKeys 
>>
>> Is one recommended over the other??? Sure appreciate the commentary before I get in too deep with all these tables.
>>
>> Thanks!
>

Re: Database Design Question

From
"James Orr"
Date:
----- Original Message -----
From: "Gonzo Rock" <GonzoRock@Excite.com>
To: <pgsql-general@postgresql.org>
Cc: <pgsql-sql@postgresql.org>
Sent: Friday, July 27, 2001 4:21 PM
Subject: RE: [SQL] Database Design Question


> OK... Fair Enough... Good Points indeed y'all.
>
> Well... What about the problem of users trying to Query the Database??
>
> You know... like when using Crystal Reports or something?.
>
> SELECT * from HistoryTable
> WHERE PartID = SomeInteger
>
> vs
>
> SELECT * from HistoryTable
> WHERE PartNum = 12345636 AND PartRev = C
>
> How are they supposed to know What the PartID is ??
>
> Anyway, that I why I was considering changing... current users always have
trouble peering into the database... They don't quite get it.

Search conditions don't HAVE to be indexes.  And you can have more than one
index.  So you could have your primary index on PartID, which would be used
by your applications and another index on PartNum and PartRev if those are
frequently searched fields for crystal reports etc.



RE: Database Design Question

From
Jimmie Fulton
Date:
You would not necessarily want clients searching by the integer-based key.
That is used internally by the database or application.  Clients wouldn't
know what the keys are anyway because they are not used for anything except
referential integrity/relation of data.  You would have Part_ID, and
Part_Number.  Part_ID is defining the relationship and Part_Number is  human
readable for searching.  If you don't want your clients to see the Part_ID,
create a view that excludes it and make them use the views for queries.  As
stated already by James, you can still make Part_Number unique even if a
primary key already exists.

Hope this makes sense/is helpful,

Jimmie Fulton
Systems Administrator
Environmental Health & Safety Office
Emory University School Of Medicine


-----Original Message-----
From: Gonzo Rock [mailto:GonzoRock@Excite.com]
Sent: Friday, July 27, 2001 4:22 PM
To: pgsql-general@postgresql.org
Cc: pgsql-sql@postgresql.org
Subject: RE: [SQL] Database Design Question


OK... Fair Enough... Good Points indeed y'all.

Well... What about the problem of users trying to Query the Database??

You know... like when using Crystal Reports or something?.

SELECT * from HistoryTable
WHERE PartID = SomeInteger

vs

SELECT * from HistoryTable
WHERE PartNum = 12345636 AND PartRev = C

How are they supposed to know What the PartID is ??

Anyway, that I why I was considering changing... current users always have
trouble peering into the database... They don't quite get it.




At 02:31 PM 7/27/01 -0400, Mike Mascari wrote:
>I prefer using unique integer ids generated from sequences rather than
>keys composed of meaningful values.
>
>Advantages:
>
>Client side applications can store/handle the unique integer ids more
>readily than having to deal with composite primary keys composed of
>varying data types. For example, I can stuff the id associated with a
>particular record easily in list boxes, combo boxes, edit controls, etc.
>via SetItemData() or some other appropriate method. Its a bit more
>complicated to track database records via composite keys of something
>like: part no, vendor no, vendor group.
>
>Updating the data doesn't require cascading updates. If you use keys
>with meaning, the referential integrity constraints must support
>cascading updates so if the key changes in the primary table the change
>is cascaded to all referencing tables as well. Earlier versions of most
>databases (Access, Oracle, etc.) only provided cascading deletes under
>the assumption you would be using sequence generated keys.
>
>Downside:
>
>Many queries might require more joins against the primary table to fetch
>the relevant information associated with the numerical id, whereas keys
>composed of solely the values with which they are associated might not
>require the joins, which will speed some applications. I now have some
>queries with 20-way joins. But PostgreSQL provides a way to explicitly
>set the path the planner will choose and so the execution of the query
>is instantaneous. I'm not sure about other databases. In earlier
>versions, I had to denormalize a bit solely for performance reasons.
>
>In the past, I used to use composite keys and switched to the purely
>sequence generated path and don't regret it at all. Of course, you'll
>still have a unique constraint on the what-would-have-been meaningful
>primary key.
>
>Hope that helps,
>
>Mike Mascari
>mascarm@mascari.com
>
>Gonzo Rock wrote:
>> 
>> A Question for those of you who consider yourself crack Database
Designers.
>> 
>> I am currently moving a large database(100+Tables) into pgSQL... with the
intention of deploying against 'any' SQL database in the future. The
development side will be rigorously using Standard SQL constructs with no
unique/proprietary extensions.
>> 
>> My question concerns establishing the relationships.
>> 
>> Currently Relationships between tables are established via a Unique
Integer ID like this:
>> 
>> *=APrimaryKey
>> 
>>  PartTypes    Customer     Parts
>>  ---------    --------     -----
>>  PartTypeID   CustomerID   PartID
>> *PartType    *Customer     PartTypeID
>>               Address      CustomerID
>>                           *PartNumber(2FieldPrimaryKey)
>>                           *PartRevision(2FieldPrimaryKey)
>>                            PartName
>> 
>> 
>> HOWEVER; I have read lots of texts describing the Relational Design
should be instead like this:
>> 
>> *=APrimaryKey
>> 
>>  PartTypes    Customer     Parts
>>  ---------    --------     -----
>> *PartType    *Customer     PartType
>>               Address     *PartNumber(2FieldPrimaryKey)
>>                           *PartRevison(2FieldPrimaryKey)
>>                            PartName
>>                            Customer
>> 
>> Both Techniques have a unique foreign key back to the parent tables but
one uses No.Meaningful.Info.Integer.Data for the ForeignKey while the second
uses Human.Understandable.ForeignKeys
>> 
>> Is one recommended over the other??? Sure appreciate the commentary
before I get in too deep with all these tables.
>> 
>> Thanks!
>

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


RE: Database Design Question

From
Joel Burton
Date:
On Fri, 27 Jul 2001, Jimmie Fulton wrote:

> I'm by no means a crack database designer, but I do have my ideas on this
> subject.  I prefer that every table has a unique integer (auto-incrementing)
> primary key.  Why?  Consistency.  With the alternative, some tables may not
> have a clear-cut candidate for a unique id.  In your example, you had
> "customer".  How many "John Smith"s would it take before we decide that is
> not a good identifier.  On the other hand, some tables would have perfectly
> logical identifiers.  Part numbers, SSNs....  So, you would need to create
> some tables with integer primary keys, and others would have some other
> natural identifier.  That to me is inconsistent.  Every table should be,
> IMHO, predictable in it's definition of a primary key.  I don't even have to
> guess what the names of my primary keys are either because the are all named
> <tablename>_ID.  Always.  I've only come up with these thoughts on my own,
> and have not extensively tried the other way, so I'd be interested in
> hearing other's ideas for the other side.

For large scale databases, there are theories aplenty about proper naming,
etc., and these should be be investigated. For small/simple databases,
this might be overkill in complexity and learning curve.

I teach a series of classes on small-scale database design for nonprofit
organizations, and recommend some simple rules:
* for tables where there is no candidate key (ie a Person table where
there is no SSN, etc.), use the table name + "id", and use a SERIAL-type.
* for tables where there is a candidate key, and that candidate key meets
all the usual criteria (always present, always unique, changes very
rarely), use the table name + "code", and use the appropriate type
(text/int/whatever), UNLESS
* there exists a very common name for this piece of info. Rather than
calling a SSN a "personcode" (in the above example), call it the SSN
* always put the primary key first in the table

Why not always use a SERIAL integer? It's a pain when a perfectly good
value exists. For example, in a table keeping track of US states, their
capitals, and governors, does it really make sense to code Maryland as
"45", when a memorable, unique, unlikely-to-change code already exists
("md")? Using a random number when a real-world code could do only forces
your user to do that lookup themselves.

[apologies to the international readers: Maryland is a state in the USA,
and "MD" is the postal code abbreviation for it]

I think that you could make some basic rules that would give you a system
that is intuitive and easy-to-rememeber, _without_ sacrificing clarity for
rigidity.

hth,
-- 
Joel Burton   <jburton@scw.org>
Director of Information Systems, Support Center of Washington