Re: Design Question (Time Series Data) - Mailing list pgsql-general

From Pavel Stehule
Subject Re: Design Question (Time Series Data)
Date
Msg-id 162867790710040220j3297053bo1b41f12af000273f@mail.gmail.com
Whole thread Raw
In response to Design Question (Time Series Data)  (Andreas Strasser <kontakt@andreas-strasser.com>)
Responses Re: Design Question (Time Series Data)  (Jorge Godoy <jgodoy@gmail.com>)
List pgsql-general
2007/10/4, Andreas Strasser <kontakt@andreas-strasser.com>:
> Hello,
>
> i'm currently designing an application that will retrieve economic data
> (mainly time series)from different sources and distribute it to clients.
> It is supposed to manage around 20.000 different series with differing
> numbers of observations (some have only a few dozen observations, others
> several thousand) and i'm now faced with the decision where and how to
> store the data.
>
> So far, i've come up with 3 possible solutions
>
> 1) Storing the observations in one big table with fields for the series,
> position within the series and the value (float)
> 2) Storing the observations in an array (either in the same table as the
> series or in an extra data-table)
> 3) Storing the observations in CSV-files on the hard disk and only
> putting a reference to it in the database
>

I did good experience with 2 variant. PostgreSQL needs 24bytes for
head of every row, so isn't too much efective store one field to one
row. You can simply do transformation between array and table now.

Pavel

> I expect that around 50 series will be updated daily - which would mean
> that for solution nr. 1 around 50.000 rows would be deleted and appended
> (again) every day.
>
> I personally prefer solution 1, because it is the easiest to implement
> (i need to make different calculations and be able to transform the data
> easily), but i'm concerned about perfomance and overhead. It effectively
> triples the space needed (over solutions nr. 2) and will result in huge
> index files.
>
> Are there any other storage methods which are better suited for this
> kind of data? How can i avoid trouble resulting from the daily updates
> (high number of deleted rows)? Which method would you prefer?
>
> Thanks in advance!
>
> Andreas
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

pgsql-general by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: About selected data dump
Next
From: Jorge Godoy
Date:
Subject: Re: Design Question (Time Series Data)