A general database question! - Mailing list pgsql-sql

From Dmitry Tkach
Subject A general database question!
Date
Msg-id 3C9A4759.6000607@openratings.com
Whole thread Raw
List pgsql-sql
Hi, everybody!

I have a general database design question for you...
Suppose, I have a database of entities, that have two attributes, say,
location and name. Location is numerical and name is a varchar.
I need to be able to search by both location and name (e.g. get all the
entries whose location is 1 and name is John etc...).
It is simple so far - I'd just create an an index on both location and
name...

Now, suppose, my entities may (or may not) have several names (aliases),
and I need my query to return the entries with any of the aliases,
matching the parameter...

Normally, I would just split the table into two - like:

create table location (id serial primary key, loc int);
create table names (id references location,
                     type int,
                     name varchar(100) not null,
                     unique (id,type,name)
                     );
Now I could do
select * from location l, names n where l.id=n.id and loc=1 and name =
'John';

...
But there are two problems with it:
The small problem is that it may return multiple rows for the same
entity (fine, suppose, I can deal with that).
The much bigger problem though is that I can't now create an index on
both location and name, and creating two separate indexes won't do me
much good as far as I understand - still only one of them will be used...

So, if there are, say 10 million entries in location 1, and none of them
is John, and there are 10 million Jons in other locations, my query
(that should be almost instantenious) would take forever :-(

The only way to get around it I can imagine is to stick location column
into the names table (and get rid of the other table) - so that I'll be
able to build an index on it...

But this is not normalized, wastes a lot of space and makes even a
simple query like select * ... where location = 1 return multiple rows
for the same entry, so that I'll have to scan the result set to remove
duplicates every time ...

I am wonderring if any of you database gurus out there have any better
ideas?

If not right away, any plans to implement multi-table indexes in
postgres? Or, perhaps, 'index intersection' (they say, SQL server has
this feature - you keep both tables clustered by the id, and the server
can then fetch two result sets from separate indexes, and merge them,
based on the fact that they are sorted by the id)...

Any input is greatly appreciated...

Dima


pgsql-sql by date:

Previous
From: Oliver Elphick
Date:
Subject: Re: Sequence not updated after a copy ?
Next
From: Dmitry Tkach
Date:
Subject: A bug in gistPageAddItem()/gist_tuple_replacekey() ???