Re: a newbie question on table design - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: a newbie question on table design
Date
Msg-id Pine.LNX.4.64.0802151749060.23796@sn.sai.msu.ru
Whole thread Raw
In response to Re: a newbie question on table design  ("Leif B. Kristensen" <leif@solumslekt.org>)
List pgsql-general
If you can select stable structure (common columns) and additional columns
than you can :

1. Use base class (table) with common columns + inherited tables with their
    own additional columns
2. We use contrib/hstore as a storage for semistructured data - we store
    additional columns as a key->value pairs in hstore data type. This is
    very flexible design.

On Fri, 15 Feb 2008, Leif B. Kristensen wrote:

> On Friday 15. February 2008, fdu.xiaojf@gmail.com wrote:
>> Hi all,
>>
>> I have a large sdf file with many records of molecules and associated
>> data items and I want to save them in a PostgreSQL database. There are
>> about less than 40 data items for every molecule(the names of the data
>> items fore each molecule are the same, but values differ).  The number
>> of molecules may exceed 20 million.
>>
>> Now I have come up with two ways to construct the table:
>>
>> 1) a table with about 40 columns, every data item has its
>> corresponding column, and one molecule corresponds to one row in the
>> table.
>>
>> This is direct and simple. The drawbacks is if I want to add more data
>> types to the database, I have to modify the structure of the table.
>>
>> 2) a table with just 3 columns:
>>
>>    CREATE TABLE mytable(
>>    id              serial,
>>    data_name       text,
>>    data_value      text
>>    );
>>
>> Then a single molecule will corresonds to about 40 rows in the
>> database.
>
> This is a sound concept, but I'd rather store the data_name in a
> separate table with an integer key, and replace data_name in mytable
> with a data_name_fk REFERENCES data_names (data_name_id). That's just
> Occam's Razor applied to database design, aka first normal form.
>
> You'd probably store the name of the molecule in a third table. Then you
> have a model very similar to the classic 'book database' where a book
> can have multiple authors, and an author can have multiple books. There
> are examples for this design all over the place.
>
>> If I need to add more data types to the table, I just need to add new
>> rows with new "data_name" column values. The drawback of this table is
>> it has too many rows(40 times of the former one) and waste a lot
>> space.
>>
>> Which one is better, or there are some other smart ways ?
>>
>> I have another question. Since the data I need to save is huge, is it
>> appropriate that I save the data value in compressed format ?
>
> That sounds a lot like premature optimization. Postgres is actually
> quite good at compacting data natively.
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

Previous
From: "Leif B. Kristensen"
Date:
Subject: Re: a newbie question on table design
Next
From: Joe Conway
Date:
Subject: Re: dynamic crosstab