Home > mailing lists

Design Question (Time Series Data) - Mailing list pgsql-general

From	Andreas Strasser
Subject	Design Question (Time Series Data)
Date	October 3, 2007 22:35:47
Msg-id	47043CC5.6090205@andreas-strasser.com Whole thread Raw
Responses	Re: Design Question (Time Series Data) Re: Design Question (Time Series Data)
List	pgsql-general

Tree view

Hello,

i'm currently designing an application that will retrieve economic data
(mainly time series)from different sources and distribute it to clients.
It is supposed to manage around 20.000 different series with differing
numbers of observations (some have only a few dozen observations, others
several thousand) and i'm now faced with the decision where and how to
store the data.

So far, i've come up with 3 possible solutions

1) Storing the observations in one big table with fields for the series,
position within the series and the value (float)
2) Storing the observations in an array (either in the same table as the
series or in an extra data-table)
3) Storing the observations in CSV-files on the hard disk and only
putting a reference to it in the database

I expect that around 50 series will be updated daily - which would mean
that for solution nr. 1 around 50.000 rows would be deleted and appended
(again) every day.

I personally prefer solution 1, because it is the easiest to implement
(i need to make different calculations and be able to transform the data
easily), but i'm concerned about perfomance and overhead. It effectively
triples the space needed (over solutions nr. 2) and will result in huge
index files.

Are there any other storage methods which are better suited for this
kind of data? How can i avoid trouble resulting from the daily updates
(high number of deleted rows)? Which method would you prefer?

Thanks in advance!

Andreas

pgsql-general by date:

From: Tom Lane
Date: 03 October 2007, 20:15:35
Subject: Re: Unexpected behavior with CASE statement

From: "Merlin Moncure"
Date: 03 October 2007, 22:48:52
Subject: Re: Easier string concat in PL funcs?

Design Question (Time Series Data) - Mailing list pgsql-general

Previous

Next