Home > mailing lists

Re: a newbie question on table design - Mailing list pgsql-general

From	Steve Atkins
Subject	Re: a newbie question on table design
Date	February 15, 2008 11:02:37
Msg-id	8DC8F40D-1D22-4304-841B-B6969F3CE0C0@blighty.com Whole thread Raw
In response to	a newbie question on table design ("fdu.xiaojf@gmail.com" <fdu.xiaojf@gmail.com>)
List	pgsql-general

Tree view

On Feb 15, 2008, at 4:49 AM, fdu.xiaojf@gmail.com wrote:

> Hi all,
>
> I have a large sdf file with many records of molecules and associated
> data items and I want to save them in a PostgreSQL database. There are
> about less than 40 data items for every molecule(the names of the data
> items fore each molecule are the same, but values differ).  The number
> of molecules may exceed 20 million.
>
> Now I have come up with two ways to construct the table:
>
> 1) a table with about 40 columns, every data item has its
> corresponding
> column, and one molecule corresponds to one row in the table.
>
> This is direct and simple. The drawbacks is if I want to add more data
> types to the database, I have to modify the structure of the table.
>
> 2) a table with just 3 columns:
>
>   CREATE TABLE mytable(
>   id              serial,
>   data_name       text,
>   data_value      text
>   );

That looks kinda like an entity-attribute-value format. I'm guessing
that either there'll be another column to define the entity, or the id
isn't
really a serial.

> Then a single molecule will corresonds to about 40 rows in the
> database.
>
> If I need to add more data types to the table, I just need to add new
> rows with new "data_name" column values. The drawback of this table is
> it has too many rows(40 times of the former one) and waste a lot
> space.
>
> Which one is better, or there are some other smart ways ?

Somebody will, shortly, leap out and explain at great length why EAV
is evil, evil, evil and would never be used by any right-thinking
person.

Don't take them too seriously. EAV is sometimes appropriate. This is
probably not one of those times, though. You're likely to get much more
benefit from the power of SQL by putting one molecule per row of the
table.

40 columns isn't excessive, and modifying the structure of the table
to add
or modify columns isn't really a problem, especially in postgresql,
where
you can take advantage of DDL being transactional.

> I have another question. Since the data I need to save is huge, is it
> appropriate that I save the data value in compressed format ?

Not on your first implementation. Postgresql does some basic compression
and out-of-line storage of data automatically. If you have huge blobs of
opaque data that you don't plan on querying from within the database you
might consider doing some client-side compression of them. Maybe. By
the third or fourth iteration, if benchmarks suggest it'd be
worthwhile. Right
now, though, don't even consider it.

Cheers,
   Steve

pgsql-general by date:

From: Joe Conway
Date: 15 February 2008, 10:59:49
Subject: Re: dynamic crosstab

From: Erik Jones
Date: 15 February 2008, 11:25:16
Subject: Re: dynamic crosstab

Re: a newbie question on table design - Mailing list pgsql-general

Previous

Next