Design Question (Time Series Data) - Mailing list pgsql-general

From Andreas Strasser
Subject Design Question (Time Series Data)
Date
Msg-id 47043CC5.6090205@andreas-strasser.com
Whole thread Raw
Responses Re: Design Question (Time Series Data)  ("Pavel Stehule" <pavel.stehule@gmail.com>)
Re: Design Question (Time Series Data)  (Ted Byers <r.ted.byers@rogers.com>)
List pgsql-general
Hello,

i'm currently designing an application that will retrieve economic data
(mainly time series)from different sources and distribute it to clients.
It is supposed to manage around 20.000 different series with differing
numbers of observations (some have only a few dozen observations, others
several thousand) and i'm now faced with the decision where and how to
store the data.

So far, i've come up with 3 possible solutions

1) Storing the observations in one big table with fields for the series,
position within the series and the value (float)
2) Storing the observations in an array (either in the same table as the
series or in an extra data-table)
3) Storing the observations in CSV-files on the hard disk and only
putting a reference to it in the database

I expect that around 50 series will be updated daily - which would mean
that for solution nr. 1 around 50.000 rows would be deleted and appended
(again) every day.

I personally prefer solution 1, because it is the easiest to implement
(i need to make different calculations and be able to transform the data
easily), but i'm concerned about perfomance and overhead. It effectively
triples the space needed (over solutions nr. 2) and will result in huge
index files.

Are there any other storage methods which are better suited for this
kind of data? How can i avoid trouble resulting from the daily updates
(high number of deleted rows)? Which method would you prefer?

Thanks in advance!

Andreas

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Unexpected behavior with CASE statement
Next
From: "Merlin Moncure"
Date:
Subject: Re: Easier string concat in PL funcs?