Thread: Advice on Database Schema
Hello, We have been proposed two database schema and are faced with making a decision between them. I am fairly new to Postgresql and am looking for advice on which would work best. Both would be done in postgresql. Our database will hold lots of different data types and will be queried frequently and have new data inserted rarely (in the long run). So we would like to have it be most efficient with queries. Only a few users would have permission to add data, most will only be allowed to query the database. One schema has on the order of 100 tables and the different types of data will be stored in different tables. Most tables will reference at least one other table. The second schema has on the order of 10 tables with more data stored in each and multiple types of data in a single table. In this case tables will reference others, but also then themselves in the case where different types of data is stored in a single table. The person designing the database schema has recommended we use the simpler design with more data per table. My initial reaction was that the schema with more tables would be faster to query. But I'm now wondering if we could use indexes on the schema with few table to search it quickly. Can anyone speculate as to which database schema might be more efficient with queries. Our queries will most often require searching through different types of data in a single query. Any advice would be appreciated. Thanks so much, Jennifer
On Tue, 2003-03-25 at 13:07, Jennifer Lee wrote: > We have been proposed two database schema and are faced with > making a decision between them. I am fairly new to Postgresql and am > looking for advice on which would work best. Both would be done in > postgresql. > > Our database will hold lots of different data types and will be queried > frequently and have new data inserted rarely (in the long run). So we > would like to have it be most efficient with queries. Only a few users > would have permission to add data, most will only be allowed to query > the database. This is more a question of data modeling than PostgreSQL. If your database is queried morst often, 3NF normalisation will take care of speed. A nice clean logical structure will be fast. Another influence is your front end - what will you be using for the requests? Cheers Tony Grant -- www.tgds.net Library management software toolkit, redhat linux on Sony Vaio C1XD, Dreamweaver MX with Tomcat and PostgreSQL
> On Tue, 2003-03-25 at 13:07, Jennifer Lee wrote: > > > We have been proposed two database schema and are faced with > > making a decision between them. I am fairly new to Postgresql and am > > looking for advice on which would work best. Both would be done in > > postgresql. > > > > Our database will hold lots of different data types and will be queried > > frequently and have new data inserted rarely (in the long run). So we > > would like to have it be most efficient with queries. Only a few users > > would have permission to add data, most will only be allowed to query > > the database. > > This is more a question of data modeling than PostgreSQL. > > If your database is queried morst often, 3NF normalisation will take > care of speed. A nice clean logical structure will be fast. Okay great, thanks for the suggestion, I'll look into this. > > Another influence is your front end - what will you be using for the > requests? > That's great to know. We haven't developed that part yet, but have talked about using Perl and SVG to make a web interface to the database. Some other people here use this approach to their MySQL database so we may try to adapt that. Do you have suggestions as to what types of front ends might help speed things up? Nothing is set in stone here yet. We've just been proposed with the alternate schema and have to decide which one we want to proceed with development. This is all still in the very early stages and I'm hoping to get on the right path to setting up this database. I've recently moved from the research side of bioinformatics into database development and realize I still have lots to learn about all this, your suggestions are greatly appreaciated. Thanks so much for the response. Jennifer
On Tue, 2003-03-25 at 12:07, Jennifer Lee wrote: > Hello, > > We have been proposed two database schema and are faced with > making a decision between them. I am fairly new to Postgresql and am > looking for advice on which would work best. Both would be done in > postgresql. > > Our database will hold lots of different data types and will be queried > frequently and have new data inserted rarely (in the long run). So we > would like to have it be most efficient with queries. Only a few users > would have permission to add data, most will only be allowed to query > the database. > > One schema has on the order of 100 tables and the different types of > data will be stored in different tables. Most tables will reference at > least one other table. The second schema has on the order of 10 tables > with more data stored in each and multiple types of data in a single > table. In this case tables will reference others, but also then > themselves in the case where different types of data is stored in a > single table. Hi Jennifer, I'm not sure I have the answer as to which one is fastest. I suppose if you perform less joins it is theoretically faster, but I don't know if speed should be your primary concern. I would go for a schema that logically splits up different entities into different tables. I'm currently cleaning up a database that sounds a lot like your "simple" system. What I've found ends up happening is that you end up with a lot of data that is wasted - ie data which is replicated or tables where thousands of rows have just empty space. Empty fields take up space as well. The other concern is that if you keep your data logically arranged into as many tables as it takes, then you'll find it's easier to "grow" your schema; that is you won't find yourself needing to make ad hoc design decisions to implement a new feature. I find that in general, referential integrity is easier to maintain like this, since you have finer grained control over what links to what. Probably the best of both worlds is to index the data you're going to search, split it up into as many bits as you need and use views to present the data to your software in a way which is less onerous than dealing with 100 tables all at once. Views have the advantage over standard SQL queries that they are compiled the first time they are used so that future calls to them are faster. HTH, James.