Re: spectral datasets in postgresql - Mailing list pgsql-general

From Antonio Fiol Bonnín
Subject Re: spectral datasets in postgresql
Date
Msg-id 3BF4CD65.B198A3D1@w3ping.com
Whole thread Raw
In response to Re: spectral datasets in postgresql  ("Nick Fankhauser" <nickf@ontko.com>)
Responses Re: spectral datasets in postgresql
List pgsql-general
Nick,

I am afraid that Glenn is not as worried about that as he is aabout the fact
that there may be up to 32 million entries per spectrum times 1000+ spectra,
which makes a total of about 3200+ million entries, supposedly on the same
relation.

Maybe I'm off-base, but I am thinking of something like:

CREATE TABLE spectra ( id SERIAL, element VARCHAR(30), ...other data about the
spectra... );
CREATE TABLE data ( id int, value float );

Table spectra would hold 1000+ entries (quite small table, after all).
Table data would hold all the "value"s for all spectra, i.e. 3200+ million
entries, and not even id is unique.

Structure seems fine to me for modest purposes, but not for this one. Of
course, you did not talk about any particular structure, Nick, but I can't
think of another one at the moment.

Of course, what here applies por PostgreSQL seems to me that may also apply for
any other RDBMS I can think of.

Nick, you are certainly a more experienced programmer than I am. Could you
please tell me what I am missing? And maybe give your opinion on the following
proposal? Thank you!!

(After re-reading my e-mail, I have come to the following point: It may be
feasible, as the rows are not very large. I am not sure if the influence of
that is very important or not. In any case, supposing 10byte/row, which is
probably a really low estimate, that makes a table with 32 Terabytes of data.
IIRC, PostgreSQL needs all the data on a particular table to fit on a
filesystem. I guess you will need an enormous partition...)

However, ...

Glenn,

Will you need to search by your "values"? A search like "I would like to know
which spectra have a spectrum "value" between X and Y", I mean...

If you will not need that, you may think of BLOBS to store your spectrum data.
That will avoid you 32 million inserts per spectrum ;-) Even if you issue them
in the same transaction, I guess the performance will be a big issue there.

(Again, re-reading and recalculating, that makes 32 million 32 bits (?)
floating point numbers per BLOB, times 1000. That is, at least, about 1
Terabyte, if I did my calculations properly).

A question has just finally come to my mind: Is compression possible, either on
the original data (transformation to avoid redundant "spectral" information) or
on the float values (gzip compression or something like that)? In that case,
you may save quite a lot of disk space.

Good luck!

Antonio Fiol

P.S. Of course, you need to add any overhead data generated by the RDBMS in
both cases.


Nick Fankhauser wrote:

> Glenn
>
> If I understand correctly, you're asking if it is possible to represent the
> one to many relationship between the spectra & the values they consist of.
> The answer is yes- that is one of the things relational databases like
> PostgreSQL do well, and what you describe is a fairly simple data structure.
>
> -Nick
>
> --------------------------------------------------------------------------
> Nick Fankhauser  nickf@ontko.com  Phone 1.765.935.4283  Fax 1.765.962.9788
> Ray Ontko & Co.     Software Consulting Services     http://www.ontko.com/
>
> > -----Original Message-----
> > From: pgsql-general-owner@postgresql.org
> > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Glenn Sullivan
> > Sent: Thursday, November 15, 2001 5:32 PM
> > To: pgsql-general@postgresql.org
> > Subject: [GENERAL] spectral datasets in postgresql
> >
> >
> > Hi,
> >
> > Is it possible/feasible/desirable to use postgresql to hold large spectral
> > datasets.  The dataset would consist of a number of attributes and a
> > spectrum.  The spectrum is simply a list of floating point
> > values.  However,
> > there can be from 128 points to 32 million points for one spectrum.
> >
> > There would then of course be thousands of these spectra.
> >
> > I don't know how you would save those 32 million points. If you can do
> > so, I don't know if it is a feasible way to use postgresql.
> >
> >
> > I welcome your comments.
> >
> > Glenn
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Compiling v7.1.3 w/OpenSSL and Krb5 support...
Next
From: "Sunit Bhatia"
Date:
Subject: Re: Database server crash ! URGENT !