Re: A general database question! - Mailing list pgsql-general

From Dmitry Tkach
Subject Re: A general database question!
Date
Msg-id 3C9B64EA.5010103@openratings.com
Whole thread Raw
In response to A general database question!  (Dmitry Tkach <dmitry@openratings.com>)
List pgsql-general
It is simple so far - I'd just create an an index on=
 both location andname...

    I am not quite understanding why a multi-colu=
mn index won't work. Can you explain a little more?Here is what=
 I was thinking: create table entity(    type int,    location int,<b=
r>    name varchar(100) not null, ); create unique index my_index on e=
ntity(location,name);It seems as though an "and" query would run qu=
ickly. It also seems as though if you needed to get a few records that =
were one of several names, a query such as the following would run quic=
kly:select * from entity where location =3D 5 and (name =3D 'joe' or na=
me=3D'joseph');or are the aliases a seperate attribute? I am not cl=
ear on exactly where the alias comes in.

Well... Yes, that's the problem - aliases ARE a separate attribute. The SAME
entry could be known as both 'joe' and 'joseph', so, as far as I can see,
I have to choices to do this, as I explained earlier - either create two
tables, one having just entry id and location, and the other one, having
id, name and type (or 'nametype' if you will), liked to the first one, or,
I could have one table with multiple rows, corresponding to the same entry.=


The problem is that I don't =9Alike any of these solutions :-(
The first one isn't good enough because I can't create an index accross two
tables, so the search by location and name would be problematic...

The second one sucks, because it duplicates all those locations, and,=9A be=
cause
it is not normalized.

Your suggestion looks like the second of these solutions... First of all,
the table you suggest, would need another column, say, entityID, telling
which entity this name entry belongs to. This column can't even be a serial
or a primary key, because there will have to be multiple rows with the same
id (to hold aliases)... Now, suppose, a paritcular entity has 10 different
names - this would create 10 different rows in the table, all of which are
supposed to have the same id and location...
First of all, I'll waste 9*(sizeof(location)) bytes per each such entity
(10 identical location values, instead of only one really needed)... And
secondly, how am I going to enforce the consistency when an entity 'moves'
(location gets changed)? All those ten rows have to be updated... In other
words, I would need to have some means of ensuring that for every given pair
of rows in that table if row1.id=3Drow2.id then row1.location=3Drow2.locati=
on...
This calls for another complication of the schema - putting a trigger on
the table to ensure the consistency....

While this all seems to be doable, please keep in mind that what we are dis=
cussing
here is just a simple example... This gets more complicated, if you take
into consideration, for example, the fact, that name and location are likely
to be not the only two attributes of an entity - there could be many more
of them... Also, consider that you have LOTS of those entitites (tens of
millions), each having, say, 20 attributes - imagine how much space would
be wasted on duplication, and how much time would be wasted executing that
trigger... :-(

Does this clarify things?

Thanks!

Dima

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: How to perform an identical insert?
Next
From: Jochem van Dieten
Date:
Subject: Re: Newbie question - Which Linux?