Thread: Question regarding modelling for time series
Hi there, I want to use a database for storing economic time series. An economic time series can be thought of as something like this: NAME | RELEASE_DATE | REPORTING_DATE | VALUE ----------------+--------------+----------------+------- Unemployment US | 2011/01/01 | 2010/12/01 | xxx Unemployment US | 2011/02/01 | 2011/01/01 | xxx Unemployment US | 2011/03/01 | 2011/02/01 | xxx The release date is the date on which the data provider published the value and the reporting date is the date to which the value refers (read: In Dec, 2010 the unemployment was X but this has not been known until 2011/01/01). However, that's not the whole story. On each "release date" not only ONE value is released but in some cases the values for previous reporting_dates are changed. So, the table could read like this: NAME | RELEASE_DATE | REPORTING_DATE | VALUE ----------------+--------------+----------------+------- Unemployment US | 2011/01/01 | 2010/12/01 | xxx Unemployment US | 2011/01/01 | 2010/11/01 | xxx Unemployment US | 2011/01/01 | 2010/10/01 | xxx Unemployment US | 2011/02/01 | 2010/10/01 | xxx Unemployment US | 2011/02/01 | 2010/11/01 | xxx Unemployment US | 2011/02/01 | 2010/12/01 | xxx Unemployment US | 2011/02/01 | 2011/01/01 | xxx [...] So, there are now mainly three questions to be answered: 1) "get me the time series [reporting_date, value] of unemployment as it is now seen", so give all reporting_date,value tuples with the most recent release_date. 2) "get me the time series [reporting_date, value] as it was published/known to the market", so that means, in this case, give this list: Unemployment US | 2011/01/01 | 2010/12/01 | xxx Unemployment US | 2011/02/01 | 2011/01/01 | xxx Unemployment US | 2011/03/01 | 2011/02/01 | xxx 3) the same as (1) but with one enhancement: if the most recent release has a history of N month, but all releases has a history of N+X month, the time series from the most recent release should be delivered plus the older values (in terms of reporting_dates) from the second most recent release plus the more older values from the third most recent release and so on. So, I thought of a relational data base model like that: TABLE 'ts' (TimeSeries) PK:id | name TABLE 'rs' (ReleaseStages) PK:id | FK:ts_id | release_date TABLE 'r' (Releases) PK:id | FK:rs_id | reporting_date | value Is this an appropriate model? If yes, how could I answer the three questions above in terms of SQL/Stored Procedures? If no, what would you suggest? And: If the datasets grow further, will be an RDBMS the right model for time series storage? Any ideas on what else I could use? Thank you very much! --Alex
On Wed, Sep 5, 2012 at 12:16 AM, Alex Grund <st.helldiver@googlemail.com> wrote: > So, I thought of a relational data base model like that: It is worth to make like this TABLE 'ts' (TimeSeries) PK:id | name TABLE 'r' (Releases) PK:id | FK:ts_id | release_date | reporting_date | value It is a little more redundant but easier to work with data. > 1) "get me the time series [reporting_date, value] of unemployment as > it is now seen", so give all reporting_date,value tuples with the most > recent release_date. SELECT * FROM r WHERE ts_id = 123 AND release_date = (SELECT max(release_date) FROM r) ORDER BY reporting_date; > 2) "get me the time series [reporting_date, value] as it was > published/known to the market", so that means, in this case, give this > list: > Unemployment US | 2011/01/01 | 2010/12/01 | xxx > Unemployment US | 2011/02/01 | 2011/01/01 | xxx > Unemployment US | 2011/03/01 | 2011/02/01 | xxx If I understand it correct it will look like SELECT DISTINCT ON (release_date) * FROM r WHERE ts_id = 123 ORDER BY release_date, reporting_date DESC; > 3) the same as (1) but with one enhancement: if the most recent > release has a history of N month, but all releases has a history of > N+X month, the time series from the most recent release should be > delivered plus the older values (in terms of reporting_dates) from the > second most recent release plus the more older values from the third > most recent release and so on. I am not quite understand what is meant here. Could you please provide more explanation and some examples. > And: If the datasets grow further, will be an RDBMS the right model > for time series storage? Any ideas on what else I could use? It will be. You will probably need some more optimization/partitioning depending on the data distribution however it can be done later. > > > > Thank you very much! > > > --Alex > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +79160686204
Sergey, thank you very much for your hints, I will play a bit with that and maybe come back to the list. Just for clarification, I attached some more explanation and examples below. 2012/9/5 Sergey Konoplev <gray.ru@gmail.com>: > I am not quite understand what is meant here. Could you please provide > more explanation and some examples. Imagine, the time series is published monthly, at the first day of a month with the value for the previous month, such like: Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 Now, imagine, that on every release, the value for the previous month is revised, such like: Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 Unemployment; release: 2011/11/01; reporting: 2011/09/01; value: 3.5 Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 Unemployment; release: 2011/10/01; reporting: 2011/08/01; value: 4.5 Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 Unemployment; release: 2011/09/01; reporting: 2011/07/01; value: 5.5 So, what I have now is a time series from 2011/07/01 to 2011/11/01. The most recent observation (release) ex-post is: [1] Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 Since the data is not revised further than one month behind, the whole series ex-post would look like that: [3] Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 Unemployment; release: 2011/11/01; reporting: 2011/09/01; value: 3.5 Unemployment; release: 2011/10/01; reporting: 2011/08/01; value: 4.5 Unemployment; release: 2011/09/01; reporting: 2011/07/01; value: 5.5 Whereas, the "known-to-market"-series would look like that: [2] Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 That are the series I want to get from the db. --Alex
On Wed, Sep 5, 2012 at 11:39 AM, Alex Grund <st.helldiver@googlemail.com> wrote: > Since the data is not revised further than one month behind, the whole > series ex-post would look like that: [3] > Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 > Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 > Unemployment; release: 2011/11/01; reporting: 2011/09/01; value: 3.5 > Unemployment; release: 2011/10/01; reporting: 2011/08/01; value: 4.5 > Unemployment; release: 2011/09/01; reporting: 2011/07/01; value: 5.5 As I understand 3 it is just a UNION of 1 and 2. > > Whereas, the "known-to-market"-series would look like that: [2] > > Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 > Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 > Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 > Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 > > That are the series I want to get from the db. > > > --Alex -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +79160686204