Re: a newbie question on table design - Mailing list pgsql-general

From Leif B. Kristensen
Subject Re: a newbie question on table design
Date
Msg-id 200802151543.57443.leif@solumslekt.org
Whole thread Raw
In response to a newbie question on table design  ("fdu.xiaojf@gmail.com" <fdu.xiaojf@gmail.com>)
Responses Re: a newbie question on table design
List pgsql-general
On Friday 15. February 2008, fdu.xiaojf@gmail.com wrote:
>Hi all,
>
>I have a large sdf file with many records of molecules and associated
>data items and I want to save them in a PostgreSQL database. There are
>about less than 40 data items for every molecule(the names of the data
>items fore each molecule are the same, but values differ).  The number
>of molecules may exceed 20 million.
>
>Now I have come up with two ways to construct the table:
>
>1) a table with about 40 columns, every data item has its
> corresponding column, and one molecule corresponds to one row in the
> table.
>
>This is direct and simple. The drawbacks is if I want to add more data
>types to the database, I have to modify the structure of the table.
>
>2) a table with just 3 columns:
>
>    CREATE TABLE mytable(
>    id              serial,
>    data_name       text,
>    data_value      text
>    );
>
>Then a single molecule will corresonds to about 40 rows in the
> database.

This is a sound concept, but I'd rather store the data_name in a
separate table with an integer key, and replace data_name in mytable
with a data_name_fk REFERENCES data_names (data_name_id). That's just
Occam's Razor applied to database design, aka first normal form.

You'd probably store the name of the molecule in a third table. Then you
have a model very similar to the classic 'book database' where a book
can have multiple authors, and an author can have multiple books. There
are examples for this design all over the place.

>If I need to add more data types to the table, I just need to add new
>rows with new "data_name" column values. The drawback of this table is
>it has too many rows(40 times of the former one) and waste a lot
> space.
>
>Which one is better, or there are some other smart ways ?
>
>I have another question. Since the data I need to save is huge, is it
>appropriate that I save the data value in compressed format ?

That sounds a lot like premature optimization. Postgres is actually
quite good at compacting data natively.
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

pgsql-general by date:

Previous
From: "Douglas McNaught"
Date:
Subject: Re: Are indexes blown?
Next
From: Oleg Bartunov
Date:
Subject: Re: a newbie question on table design