Thread: Re: Database normalization

Re: Database normalization

From
"Sebastian Ritter"
Date:
Hello,

I have a fairly basic question about database design where im not sure which approach is considered correct.

I have two different entities: Clients and Services. Both allow users to add progressive updates about the two entities.

The update/message format is exactly the same for both. Should I make two different tables:

  client_updates and service_updates
  or
  one table with extra columns : is_client, client_id, service_id, where either client_id or service_id would be null depending on the is_client boolean?

The major problem is simply relating the foreign key in the updates table back to the correct entity, client or service.

Regards,
Sebastian


Re: Database normalization

From
"Asko Oja"
Date:
Hi

I would create one table to log updates with type field and one id filed that contains either client id or service id according to type. On such a table i would forget about foreign keys (thay are better to be avoided anyway if you have millions of records in tables).
That way you can share code that displays update history or even create some generic components.
In my experience normalized databases have ended up with hundreds of tables and code and screen generation that is very far from what users actually need. Denormalizing and refactoring these databases reduces number of tables by magnitude.

Regards,
Asko


On 8/28/07, Sebastian Ritter <ritter.sebastian@gmail.com> wrote:
Hello,

I have a fairly basic question about database design where im not sure which approach is considered correct.

I have two different entities: Clients and Services. Both allow users to add progressive updates about the two entities.

The update/message format is exactly the same for both. Should I make two different tables:

  client_updates and service_updates
  or
  one table with extra columns : is_client, client_id, service_id, where either client_id or service_id would be null depending on the is_client boolean?

The major problem is simply relating the foreign key in the updates table back to the correct entity, client or service.

Regards,
Sebastian



Re: Database normalization

From
Andrew Sullivan
Date:
On Tue, Aug 28, 2007 at 12:47:45PM +0100, Sebastian Ritter wrote:
> > The update/message format is exactly the same for both. Should I make two
> > different tables:

> >   one table with extra columns : is_client, client_id, service_id, where
> > either client_id or service_id would be null depending on the is_client
> > boolean?

Is the rest of the data the same?  If so, then one table is right. 
If not, then more than one table.  In either case, I really hate the
idea of two columns, one of which is always null.  But if you're
going to do that, make sure you add a CHECK constraint where !(col1
IS NULL and col2 IS NULL).

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.    --J.D. Baldwin


Re: Database normalization

From
"Sebastian Ritter"
Date:
Thanks for the information.

Both tables would be exactly sames apart from the  foreign key relation to clients or services. I agree that it seems strange to have one column that is always null. Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as : CHECK constraint where !(col1
IS NULL and col2 IS NULL).

Another factor ive been considering is that one of the fields in this table(s) definition(s) is free flowing text which could potentially become very large. Should I take this in to
consideration when deciding whether to split the tables? In terms of searching speed that is.

Kindest regards.
Sebastian

On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Tue, Aug 28, 2007 at 12:47:45PM +0100, Sebastian Ritter wrote:
> > The update/message format is exactly the same for both. Should I make two
> > different tables:

> >   one table with extra columns : is_client, client_id, service_id, where
> > either client_id or service_id would be null depending on the is_client
> > boolean?

Is the rest of the data the same?  If so, then one table is right.
If not, then more than one table.  In either case, I really hate the
idea of two columns, one of which is always null.  But if you're
going to do that, make sure you add a CHECK constraint where !(col1
IS NULL and col2 IS NULL).

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to.  That actually seems sort of quaint now.
                --J.D. Baldwin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: Database normalization

From
Erik Jones
Date:
On Aug 28, 2007, at 6:47 AM, Sebastian Ritter wrote:

> Hello,
>
> I have a fairly basic question about database design where im not
> sure which approach is considered correct.
>
> I have two different entities: Clients and Services. Both allow
> users to add progressive updates about the two entities.
>
> The update/message format is exactly the same for both. Should I
> make two different tables:
>
>   client_updates and service_updates
>   or
>   one table with extra columns : is_client, client_id, service_id,
> where either client_id or service_id would be null depending on the
> is_client boolean?
>
> The major problem is simply relating the foreign key in the updates
> table back to the correct entity, client or service.

Are client_id and service_id artificial ids?  If so, is it an actual
requirement that they have distinct id sequences?  I.e. is it
necessary that there can be both client_id=1 and service_id=1?  If
not, you can use one table, say Entities, lose client_id, service_id,
and is_clent and replace them with entity_id and entity_type.  Then
your foreign key in your updates table just needs to reference
entity_id, or possibly (entity_id, entity_type) if you want to be
really strict about things.  If you want to make querying the table
simple for either case create Clients and Services views on the
table.  This also gives you the ability to add other entity types
where you may to track whatever kind of updates these are.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: Database normalization

From
Andrew Sullivan
Date:
On Tue, Aug 28, 2007 at 03:37:22PM +0100, Sebastian Ritter wrote:
> Thanks for the information.
> 
> Both tables would be exactly sames apart from the  foreign key relation to
> clients or services.

Hmm.  Are the services or clients tables different?  A useful rule of
thumb is that, to the extent you can sort things into "kinds of
data", then you should have exactly one space for each one.  (I hope
that's clear.)

> Another factor ive been considering is that one of the fields in this
> table(s) definition(s) is free flowing text which could potentially become
> very large. Should I take this in to
> consideration when deciding whether to split the tables? In terms of
> searching speed that is.

I'd put it in its own table, probably, unless you're going to use it
frequently.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
Everything that happens in the world happens at some place.    --Jane Jacobs 


Re: Database normalization

From
"Sebastian Ritter"
Date:
Hi,

The views option sounds best in my opinion but sadly i cant play with things like inheritance or views as Im using Django as my ORM and since its relatively new, it doesnt yet support such database operations.

Id like the ids for each "entity" to be serial and hence there would be overlap between client_ids and service_ids.

Cheers,
Sebastian

On 8/28/07, Erik Jones <erik@myemma.com> wrote:
On Aug 28, 2007, at 6:47 AM, Sebastian Ritter wrote:

> Hello,
>
> I have a fairly basic question about database design where im not
> sure which approach is considered correct.
>
> I have two different entities: Clients and Services. Both allow
> users to add progressive updates about the two entities.
>
> The update/message format is exactly the same for both. Should I
> make two different tables:
>
>   client_updates and service_updates
>   or
>   one table with extra columns : is_client, client_id, service_id,
> where either client_id or service_id would be null depending on the
> is_client boolean?
>
> The major problem is simply relating the foreign key in the updates
> table back to the correct entity, client or service.

Are client_id and service_id artificial ids?  If so, is it an actual
requirement that they have distinct id sequences?  I.e. is it
necessary that there can be both client_id=1 and service_id=1?  If
not, you can use one table, say Entities, lose client_id, service_id,
and is_clent and replace them with entity_id and entity_type.  Then
your foreign key in your updates table just needs to reference
entity_id, or possibly (entity_id, entity_type) if you want to be
really strict about things.  If you want to make querying the table
simple for either case create Clients and Services views on the
table.  This also gives you the ability to add other entity types
where you may to track whatever kind of updates these are.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Database normalization

From
"Sebastian Ritter"
Date:
Hi,

On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Tue, Aug 28, 2007 at 03:37:22PM +0100, Sebastian Ritter wrote:
> Thanks for the information.
>
> Both tables would be exactly sames apart from the  foreign key relation to
> clients or services.

Hmm.  Are the services or clients tables different?  A useful rule of
thumb is that, to the extent you can sort things into "kinds of
data", then you should have exactly one space for each one.  (I hope
that's clear.)

  The table definition is exactly the same. The only difference is whether the
  row refers to a client or service.
 

> Another factor ive been considering is that one of the fields in this
> table(s) definition(s) is free flowing text which could potentially become
> very large. Should I take this in to
> consideration when deciding whether to split the tables? In terms of
> searching speed that is.

I'd put it in its own table, probably, unless you're going to use it
frequently.

  Why would frequency of use change whether or not I use one or two tables?
 
Sebastian

Re: Database normalization

From
"Bart Degryse"
Date:
<quote>Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as ...</quote>
Then you should seriously consider changing your mapper.

>>> "Sebastian Ritter" <ritter.sebastian@gmail.com> 2007-08-28 16:37 >>>
Thanks for the information.

Both tables would be exactly sames apart from the  foreign key relation to clients or services. I agree that it seems strange to have one column that is always null. Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as : CHECK constraint where !(col1
IS NULL and col2 IS NULL).

Another factor ive been considering is that one of the fields in this table(s) definition(s) is free flowing text which could potentially become very large. Should I take this in to
consideration when deciding whether to split the tables? In terms of searching speed that is.

Kindest regards.
Sebastian

On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Tue, Aug 28, 2007 at 12:47:45PM +0100, Sebastian Ritter wrote:
> > The update/message format is exactly the same for both. Should I make two
> > different tables:

> >   one table with extra columns : is_client, client_id, service_id, where
> > either client_id or service_id would be null depending on the is_client
> > boolean?

Is the rest of the data the same?  If so, then one table is right.
If not, then more than one table.  In either case, I really hate the
idea of two columns, one of which is always null.  But if you're
going to do that, make sure you add a CHECK constraint where !(col1
IS NULL and col2 IS NULL).

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to.  That actually seems sort of quaint now.
                --J.D. Baldwin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: Database normalization

From
Andrew Sullivan
Date:
On Tue, Aug 28, 2007 at 04:15:01PM +0100, Sebastian Ritter wrote:
> > I'd put it in its own table, probably, unless you're going to use it
> > frequently.
> 
>   Why would frequency of use change whether or not I use one or two tables?

If you have a possibly-large field that does not get used very much,
you have to pay the I/O for it every time you look at that row, even
if it's not used.  Also, it sounds like it might not be used by every
row?  In that case, normalization calls for it to be pushed out too.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
However important originality may be in some fields, restraint and 
adherence to procedure emerge as the more significant virtues in a 
great many others.   --Alain de Botton


Re: Database normalization

From
"Sebastian Ritter"
Date:
Thanks guys,

Sebastian

On 8/28/07, Bart Degryse <Bart.Degryse@indicator.be> wrote:
<quote>Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as ...</quote>
Then you should seriously consider changing your mapper.

>>> "Sebastian Ritter" < ritter.sebastian@gmail.com> 2007-08-28 16:37 >>>

Thanks for the information.

Both tables would be exactly sames apart from the  foreign key relation to clients or services. I agree that it seems strange to have one column that is always null. Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as : CHECK constraint where !(col1
IS NULL and col2 IS NULL).

Another factor ive been considering is that one of the fields in this table(s) definition(s) is free flowing text which could potentially become very large. Should I take this in to
consideration when deciding whether to split the tables? In terms of searching speed that is.

Kindest regards.
Sebastian

On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca > wrote:
On Tue, Aug 28, 2007 at 12:47:45PM +0100, Sebastian Ritter wrote:
> > The update/message format is exactly the same for both. Should I make two
> > different tables:

> >   one table with extra columns : is_client, client_id, service_id, where
> > either client_id or service_id would be null depending on the is_client
> > boolean?

Is the rest of the data the same?  If so, then one table is right.
If not, then more than one table.  In either case, I really hate the
idea of two columns, one of which is always null.  But if you're
going to do that, make sure you add a CHECK constraint where !(col1
IS NULL and col2 IS NULL).

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to.  That actually seems sort of quaint now.
                --J.D. Baldwin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq