Thread: Advice on Database Schema

Advice on Database Schema

From

"Jennifer Lee"

Date:

25 March 2003, 07:04:29

Hello,

We have been proposed two database schema and are faced with
making a decision between them. I am fairly new to Postgresql and am
looking for advice on which would work best. Both would be done in
postgresql.

Our database will hold lots of different data types and will be queried
frequently and have new data inserted rarely (in the long run). So we
would like to have it be most efficient with queries. Only a few users
would have permission to add data, most will only be allowed to query
the database.

One schema has on the order of 100 tables and the different types of
data will be stored in different tables. Most tables will reference at
least one other table. The second schema has on the order of 10 tables
with more data stored in each and multiple types of data in a single
table. In this case tables will reference others, but also then
themselves in the case where different types of data is stored in a
single table.

The person designing the database schema has recommended we use
the simpler design with more data per table. My initial reaction was
that the schema with more tables would be faster to query. But I'm
now wondering if we could use indexes on the schema with few table
to search it quickly. Can anyone speculate as to which database
schema might be more efficient with queries. Our queries will most
often require searching through different types of data in a single
query.

Any advice would be appreciated.

Thanks so much,
Jennifer

Re: Advice on Database Schema

From

Tony Grant

Date:

25 March 2003, 07:48:58

On Tue, 2003-03-25 at 13:07, Jennifer Lee wrote:

> We have been proposed two database schema and are faced with
> making a decision between them. I am fairly new to Postgresql and am
> looking for advice on which would work best. Both would be done in
> postgresql.
>
> Our database will hold lots of different data types and will be queried
> frequently and have new data inserted rarely (in the long run). So we
> would like to have it be most efficient with queries. Only a few users
> would have permission to add data, most will only be allowed to query
> the database.

This is more a question of data modeling than PostgreSQL.

If your database is queried morst often, 3NF normalisation will take
care of speed. A nice clean logical structure will be fast.

Another influence is your front end - what will you be using for the
requests?

Cheers

Tony Grant

--
www.tgds.net Library management software toolkit,
redhat linux on Sony Vaio C1XD,
Dreamweaver MX with Tomcat and PostgreSQL

Re: Advice on Database Schema

From

"Jennifer Lee"

Date:

25 March 2003, 09:02:45

> On Tue, 2003-03-25 at 13:07, Jennifer Lee wrote:
>
> > We have been proposed two database schema and are faced with
> > making a decision between them. I am fairly new to Postgresql and am
> > looking for advice on which would work best. Both would be done in
> > postgresql.
> >
> > Our database will hold lots of different data types and will be queried
> > frequently and have new data inserted rarely (in the long run). So we
> > would like to have it be most efficient with queries. Only a few users
> > would have permission to add data, most will only be allowed to query
> > the database.
>
> This is more a question of data modeling than PostgreSQL.
>
> If your database is queried morst often, 3NF normalisation will take
> care of speed. A nice clean logical structure will be fast.

Okay great, thanks for the suggestion, I'll look into this.
>
> Another influence is your front end - what will you be using for the
> requests?
>
That's great to know. We haven't developed that part yet, but have
talked about using Perl and SVG to make a web interface to the
database. Some other people here use this approach to their MySQL
database so we may try to adapt that. Do you have suggestions as to
what types of front ends might help speed things up? Nothing is set
in stone here yet.

We've just been proposed with the alternate schema and have to
decide which one we want to proceed with development. This is all
still in the very early stages and I'm hoping to get on the right path to
setting up this database. I've recently moved from the research side of
bioinformatics into database development and realize I still have lots
to learn about all this, your suggestions are greatly appreaciated.

Thanks so much for the response.
Jennifer

Re: Advice on Database Schema

From

James Gregory

Date:

25 March 2003, 19:01:12

On Tue, 2003-03-25 at 12:07, Jennifer Lee wrote:
> Hello,
>
> We have been proposed two database schema and are faced with
> making a decision between them. I am fairly new to Postgresql and am
> looking for advice on which would work best. Both would be done in
> postgresql.
>
> Our database will hold lots of different data types and will be queried
> frequently and have new data inserted rarely (in the long run). So we
> would like to have it be most efficient with queries. Only a few users
> would have permission to add data, most will only be allowed to query
> the database.
>
> One schema has on the order of 100 tables and the different types of
> data will be stored in different tables. Most tables will reference at
> least one other table. The second schema has on the order of 10 tables
> with more data stored in each and multiple types of data in a single
> table. In this case tables will reference others, but also then
> themselves in the case where different types of data is stored in a
> single table.

Hi Jennifer,

I'm not sure I have the answer as to which one is fastest. I suppose if
you perform less joins it is theoretically faster, but I don't know if
speed should be your primary concern. I would go for a schema that
logically splits up different entities into different tables. I'm
currently cleaning up a database that sounds a lot like your "simple"
system. What I've found ends up happening is that you end up with a lot
of data that is wasted - ie data which is replicated or tables where
thousands of rows have just empty space. Empty fields take up space as
well.

The other concern is that if you keep your data logically arranged into
as many tables as it takes, then you'll find it's easier to "grow" your
schema; that is you won't find yourself needing to make ad hoc design
decisions to implement a new feature. I find that in general,
referential integrity is easier to maintain like this, since you have
finer grained control over what links to what.

Probably the best of both worlds is to index the data you're going to
search, split it up into as many bits as you need and use views to
present the data to your software in a way which is less onerous than
dealing with 100 tables all at once. Views have the advantage over
standard SQL queries that they are compiled the first time they are used
so that future calls to them are faster.

HTH,

James.