Question regarding modelling for time series - Mailing list pgsql-sql
From | Alex Grund |
---|---|
Subject | Question regarding modelling for time series |
Date | |
Msg-id | CACjun4+niDs1FfZYc4PQ7hH+XepSVu_oLHBdXdkOhjgsnOeW-A@mail.gmail.com Whole thread Raw |
Responses |
Re: Question regarding modelling for time series
|
List | pgsql-sql |
Hi there, I want to use a database for storing economic time series. An economic time series can be thought of as something like this: NAME | RELEASE_DATE | REPORTING_DATE | VALUE ----------------+--------------+----------------+------- Unemployment US | 2011/01/01 | 2010/12/01 | xxx Unemployment US | 2011/02/01 | 2011/01/01 | xxx Unemployment US | 2011/03/01 | 2011/02/01 | xxx The release date is the date on which the data provider published the value and the reporting date is the date to which the value refers (read: In Dec, 2010 the unemployment was X but this has not been known until 2011/01/01). However, that's not the whole story. On each "release date" not only ONE value is released but in some cases the values for previous reporting_dates are changed. So, the table could read like this: NAME | RELEASE_DATE | REPORTING_DATE | VALUE ----------------+--------------+----------------+------- Unemployment US | 2011/01/01 | 2010/12/01 | xxx Unemployment US | 2011/01/01 | 2010/11/01 | xxx Unemployment US | 2011/01/01 | 2010/10/01 | xxx Unemployment US | 2011/02/01 | 2010/10/01 | xxx Unemployment US | 2011/02/01 | 2010/11/01 | xxx Unemployment US | 2011/02/01 | 2010/12/01 | xxx Unemployment US | 2011/02/01 | 2011/01/01 | xxx [...] So, there are now mainly three questions to be answered: 1) "get me the time series [reporting_date, value] of unemployment as it is now seen", so give all reporting_date,value tuples with the most recent release_date. 2) "get me the time series [reporting_date, value] as it was published/known to the market", so that means, in this case, give this list: Unemployment US | 2011/01/01 | 2010/12/01 | xxx Unemployment US | 2011/02/01 | 2011/01/01 | xxx Unemployment US | 2011/03/01 | 2011/02/01 | xxx 3) the same as (1) but with one enhancement: if the most recent release has a history of N month, but all releases has a history of N+X month, the time series from the most recent release should be delivered plus the older values (in terms of reporting_dates) from the second most recent release plus the more older values from the third most recent release and so on. So, I thought of a relational data base model like that: TABLE 'ts' (TimeSeries) PK:id | name TABLE 'rs' (ReleaseStages) PK:id | FK:ts_id | release_date TABLE 'r' (Releases) PK:id | FK:rs_id | reporting_date | value Is this an appropriate model? If yes, how could I answer the three questions above in terms of SQL/Stored Procedures? If no, what would you suggest? And: If the datasets grow further, will be an RDBMS the right model for time series storage? Any ideas on what else I could use? Thank you very much! --Alex