Re: a newbie question on table design - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: a newbie question on table design |
Date | |
Msg-id | Pine.LNX.4.64.0802151749060.23796@sn.sai.msu.ru Whole thread Raw |
In response to | Re: a newbie question on table design ("Leif B. Kristensen" <leif@solumslekt.org>) |
List | pgsql-general |
If you can select stable structure (common columns) and additional columns than you can : 1. Use base class (table) with common columns + inherited tables with their own additional columns 2. We use contrib/hstore as a storage for semistructured data - we store additional columns as a key->value pairs in hstore data type. This is very flexible design. On Fri, 15 Feb 2008, Leif B. Kristensen wrote: > On Friday 15. February 2008, fdu.xiaojf@gmail.com wrote: >> Hi all, >> >> I have a large sdf file with many records of molecules and associated >> data items and I want to save them in a PostgreSQL database. There are >> about less than 40 data items for every molecule(the names of the data >> items fore each molecule are the same, but values differ). The number >> of molecules may exceed 20 million. >> >> Now I have come up with two ways to construct the table: >> >> 1) a table with about 40 columns, every data item has its >> corresponding column, and one molecule corresponds to one row in the >> table. >> >> This is direct and simple. The drawbacks is if I want to add more data >> types to the database, I have to modify the structure of the table. >> >> 2) a table with just 3 columns: >> >> CREATE TABLE mytable( >> id serial, >> data_name text, >> data_value text >> ); >> >> Then a single molecule will corresonds to about 40 rows in the >> database. > > This is a sound concept, but I'd rather store the data_name in a > separate table with an integer key, and replace data_name in mytable > with a data_name_fk REFERENCES data_names (data_name_id). That's just > Occam's Razor applied to database design, aka first normal form. > > You'd probably store the name of the molecule in a third table. Then you > have a model very similar to the classic 'book database' where a book > can have multiple authors, and an author can have multiple books. There > are examples for this design all over the place. > >> If I need to add more data types to the table, I just need to add new >> rows with new "data_name" column values. The drawback of this table is >> it has too many rows(40 times of the former one) and waste a lot >> space. >> >> Which one is better, or there are some other smart ways ? >> >> I have another question. Since the data I need to save is huge, is it >> appropriate that I save the data value in compressed format ? > > That sounds a lot like premature optimization. Postgres is actually > quite good at compacting data natively. > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: