Thread: Re: Database normalization
Hello,
I have a fairly basic question about database design where im not sure which approach is considered correct.
I have two different entities: Clients and Services. Both allow users to add progressive updates about the two entities.
The update/message format is exactly the same for both. Should I make two different tables:
client_updates and service_updates
or
one table with extra columns : is_client, client_id, service_id, where either client_id or service_id would be null depending on the is_client boolean?
The major problem is simply relating the foreign key in the updates table back to the correct entity, client or service.
Regards,
Sebastian
Hi
I would create one table to log updates with type field and one id filed that contains either client id or service id according to type. On such a table i would forget about foreign keys (thay are better to be avoided anyway if you have millions of records in tables).
That way you can share code that displays update history or even create some generic components.
In my experience normalized databases have ended up with hundreds of tables and code and screen generation that is very far from what users actually need. Denormalizing and refactoring these databases reduces number of tables by magnitude.
Regards,
Asko
I would create one table to log updates with type field and one id filed that contains either client id or service id according to type. On such a table i would forget about foreign keys (thay are better to be avoided anyway if you have millions of records in tables).
That way you can share code that displays update history or even create some generic components.
In my experience normalized databases have ended up with hundreds of tables and code and screen generation that is very far from what users actually need. Denormalizing and refactoring these databases reduces number of tables by magnitude.
Regards,
Asko
On 8/28/07, Sebastian Ritter <ritter.sebastian@gmail.com> wrote:
Hello,
I have a fairly basic question about database design where im not sure which approach is considered correct.
I have two different entities: Clients and Services. Both allow users to add progressive updates about the two entities.
The update/message format is exactly the same for both. Should I make two different tables:
client_updates and service_updates
or
one table with extra columns : is_client, client_id, service_id, where either client_id or service_id would be null depending on the is_client boolean?
The major problem is simply relating the foreign key in the updates table back to the correct entity, client or service.
Regards,
Sebastian
On Tue, Aug 28, 2007 at 12:47:45PM +0100, Sebastian Ritter wrote: > > The update/message format is exactly the same for both. Should I make two > > different tables: > > one table with extra columns : is_client, client_id, service_id, where > > either client_id or service_id would be null depending on the is_client > > boolean? Is the rest of the data the same? If so, then one table is right. If not, then more than one table. In either case, I really hate the idea of two columns, one of which is always null. But if you're going to do that, make sure you add a CHECK constraint where !(col1 IS NULL and col2 IS NULL). A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Thanks for the information.
Both tables would be exactly sames apart from the foreign key relation to clients or services. I agree that it seems strange to have one column that is always null. Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as : CHECK constraint where !(col1
IS NULL and col2 IS NULL).
Another factor ive been considering is that one of the fields in this table(s) definition(s) is free flowing text which could potentially become very large. Should I take this in to
consideration when deciding whether to split the tables? In terms of searching speed that is.
Kindest regards.
Sebastian
Both tables would be exactly sames apart from the foreign key relation to clients or services. I agree that it seems strange to have one column that is always null. Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as : CHECK constraint where !(col1
IS NULL and col2 IS NULL).
Another factor ive been considering is that one of the fields in this table(s) definition(s) is free flowing text which could potentially become very large. Should I take this in to
consideration when deciding whether to split the tables? In terms of searching speed that is.
Kindest regards.
Sebastian
On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Tue, Aug 28, 2007 at 12:47:45PM +0100, Sebastian Ritter wrote:
> > The update/message format is exactly the same for both. Should I make two
> > different tables:
> > one table with extra columns : is_client, client_id, service_id, where
> > either client_id or service_id would be null depending on the is_client
> > boolean?
Is the rest of the data the same? If so, then one table is right.
If not, then more than one table. In either case, I really hate the
idea of two columns, one of which is always null. But if you're
going to do that, make sure you add a CHECK constraint where !(col1
IS NULL and col2 IS NULL).
A
--
Andrew Sullivan | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to. That actually seems sort of quaint now.
--J.D. Baldwin
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
On Aug 28, 2007, at 6:47 AM, Sebastian Ritter wrote: > Hello, > > I have a fairly basic question about database design where im not > sure which approach is considered correct. > > I have two different entities: Clients and Services. Both allow > users to add progressive updates about the two entities. > > The update/message format is exactly the same for both. Should I > make two different tables: > > client_updates and service_updates > or > one table with extra columns : is_client, client_id, service_id, > where either client_id or service_id would be null depending on the > is_client boolean? > > The major problem is simply relating the foreign key in the updates > table back to the correct entity, client or service. Are client_id and service_id artificial ids? If so, is it an actual requirement that they have distinct id sequences? I.e. is it necessary that there can be both client_id=1 and service_id=1? If not, you can use one table, say Entities, lose client_id, service_id, and is_clent and replace them with entity_id and entity_type. Then your foreign key in your updates table just needs to reference entity_id, or possibly (entity_id, entity_type) if you want to be really strict about things. If you want to make querying the table simple for either case create Clients and Services views on the table. This also gives you the ability to add other entity types where you may to track whatever kind of updates these are. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On Tue, Aug 28, 2007 at 03:37:22PM +0100, Sebastian Ritter wrote: > Thanks for the information. > > Both tables would be exactly sames apart from the foreign key relation to > clients or services. Hmm. Are the services or clients tables different? A useful rule of thumb is that, to the extent you can sort things into "kinds of data", then you should have exactly one space for each one. (I hope that's clear.) > Another factor ive been considering is that one of the fields in this > table(s) definition(s) is free flowing text which could potentially become > very large. Should I take this in to > consideration when deciding whether to split the tables? In terms of > searching speed that is. I'd put it in its own table, probably, unless you're going to use it frequently. A -- Andrew Sullivan | ajs@crankycanuck.ca Everything that happens in the world happens at some place. --Jane Jacobs
Hi,
The views option sounds best in my opinion but sadly i cant play with things like inheritance or views as Im using Django as my ORM and since its relatively new, it doesnt yet support such database operations.
Id like the ids for each "entity" to be serial and hence there would be overlap between client_ids and service_ids.
Cheers,
Sebastian
The views option sounds best in my opinion but sadly i cant play with things like inheritance or views as Im using Django as my ORM and since its relatively new, it doesnt yet support such database operations.
Id like the ids for each "entity" to be serial and hence there would be overlap between client_ids and service_ids.
Cheers,
Sebastian
On 8/28/07, Erik Jones <erik@myemma.com> wrote:
On Aug 28, 2007, at 6:47 AM, Sebastian Ritter wrote:
> Hello,
>
> I have a fairly basic question about database design where im not
> sure which approach is considered correct.
>
> I have two different entities: Clients and Services. Both allow
> users to add progressive updates about the two entities.
>
> The update/message format is exactly the same for both. Should I
> make two different tables:
>
> client_updates and service_updates
> or
> one table with extra columns : is_client, client_id, service_id,
> where either client_id or service_id would be null depending on the
> is_client boolean?
>
> The major problem is simply relating the foreign key in the updates
> table back to the correct entity, client or service.
Are client_id and service_id artificial ids? If so, is it an actual
requirement that they have distinct id sequences? I.e. is it
necessary that there can be both client_id=1 and service_id=1? If
not, you can use one table, say Entities, lose client_id, service_id,
and is_clent and replace them with entity_id and entity_type. Then
your foreign key in your updates table just needs to reference
entity_id, or possibly (entity_id, entity_type) if you want to be
really strict about things. If you want to make querying the table
simple for either case create Clients and Services views on the
table. This also gives you the ability to add other entity types
where you may to track whatever kind of updates these are.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
Hi,
On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
The table definition is exactly the same. The only difference is whether the
row refers to a client or service.
Why would frequency of use change whether or not I use one or two tables?
Sebastian
On Tue, Aug 28, 2007 at 03:37:22PM +0100, Sebastian Ritter wrote:
> Thanks for the information.
>
> Both tables would be exactly sames apart from the foreign key relation to
> clients or services.
Hmm. Are the services or clients tables different? A useful rule of
thumb is that, to the extent you can sort things into "kinds of
data", then you should have exactly one space for each one. (I hope
that's clear.)
The table definition is exactly the same. The only difference is whether the
row refers to a client or service.
> Another factor ive been considering is that one of the fields in this
> table(s) definition(s) is free flowing text which could potentially become
> very large. Should I take this in to
> consideration when deciding whether to split the tables? In terms of
> searching speed that is.
I'd put it in its own table, probably, unless you're going to use it
frequently.
Why would frequency of use change whether or not I use one or two tables?
<quote>Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as ...</quote>
Then you should seriously consider changing your mapper.
>>> "Sebastian Ritter" <ritter.sebastian@gmail.com> 2007-08-28 16:37 >>>
Thanks for the information.
Both tables would be exactly sames apart from the foreign key relation to clients or services. I agree that it seems strange to have one column that is always null. Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as : CHECK constraint where !(col1
IS NULL and col2 IS NULL).
Another factor ive been considering is that one of the fields in this table(s) definition(s) is free flowing text which could potentially become very large. Should I take this in to
consideration when deciding whether to split the tables? In terms of searching speed that is.
Kindest regards.
Sebastian
>>> "Sebastian Ritter" <ritter.sebastian@gmail.com> 2007-08-28 16:37 >>>
Thanks for the information.
Both tables would be exactly sames apart from the foreign key relation to clients or services. I agree that it seems strange to have one column that is always null. Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as : CHECK constraint where !(col1
IS NULL and col2 IS NULL).
Another factor ive been considering is that one of the fields in this table(s) definition(s) is free flowing text which could potentially become very large. Should I take this in to
consideration when deciding whether to split the tables? In terms of searching speed that is.
Kindest regards.
Sebastian
On 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Tue, Aug 28, 2007 at 12:47:45PM +0100, Sebastian Ritter wrote:
> > The update/message format is exactly the same for both. Should I make two
> > different tables:
> > one table with extra columns : is_client, client_id, service_id, where
> > either client_id or service_id would be null depending on the is_client
> > boolean?
Is the rest of the data the same? If so, then one table is right.
If not, then more than one table. In either case, I really hate the
idea of two columns, one of which is always null. But if you're
going to do that, make sure you add a CHECK constraint where !(col1
IS NULL and col2 IS NULL).
A
--
Andrew Sullivan | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to. That actually seems sort of quaint now.
--J.D. Baldwin
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
On Tue, Aug 28, 2007 at 04:15:01PM +0100, Sebastian Ritter wrote: > > I'd put it in its own table, probably, unless you're going to use it > > frequently. > > Why would frequency of use change whether or not I use one or two tables? If you have a possibly-large field that does not get used very much, you have to pay the I/O for it every time you look at that row, even if it's not used. Also, it sounds like it might not be used by every row? In that case, normalization calls for it to be pushed out too. A -- Andrew Sullivan | ajs@crankycanuck.ca However important originality may be in some fields, restraint and adherence to procedure emerge as the more significant virtues in a great many others. --Alain de Botton
Thanks guys,
Sebastian
Sebastian
On 8/28/07, Bart Degryse <Bart.Degryse@indicator.be> wrote:
<quote>Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as ...</quote>Then you should seriously consider changing your mapper.
>>> "Sebastian Ritter" < ritter.sebastian@gmail.com> 2007-08-28 16:37 >>>
Thanks for the information.
Both tables would be exactly sames apart from the foreign key relation to clients or services. I agree that it seems strange to have one column that is always null. Im using Django as my Object relational Mapper so im pretty sure I can not add a constraint such as : CHECK constraint where !(col1
IS NULL and col2 IS NULL).
Another factor ive been considering is that one of the fields in this table(s) definition(s) is free flowing text which could potentially become very large. Should I take this in to
consideration when deciding whether to split the tables? In terms of searching speed that is.
Kindest regards.
SebastianOn 8/28/07, Andrew Sullivan <ajs@crankycanuck.ca > wrote:On Tue, Aug 28, 2007 at 12:47:45PM +0100, Sebastian Ritter wrote:
> > The update/message format is exactly the same for both. Should I make two
> > different tables:
> > one table with extra columns : is_client, client_id, service_id, where
> > either client_id or service_id would be null depending on the is_client
> > boolean?
Is the rest of the data the same? If so, then one table is right.
If not, then more than one table. In either case, I really hate the
idea of two columns, one of which is always null. But if you're
going to do that, make sure you add a CHECK constraint where !(col1
IS NULL and col2 IS NULL).
A
--
Andrew Sullivan | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to. That actually seems sort of quaint now.
--J.D. Baldwin
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq