Thread: graphing time series data

graphing time series data

From
Louis-David Mitterrand
Date:
Hi,

I have times series data in a 'price' table:

price(id_price, price, id_product, created, modified)

Prices are polled daily and a new 'price' row is created only if the
price of id_product changes, else modified is updated to now().

Now, I'd like to make a graph of average prices per week, per
id_product. As some prices don't vary much, distribution would not be
ideal if I simply 'group by extract(week from p.modified)'.

Ideally I'd generate_series() a list of weeks between min(p.created) and
max(p.modified) and then average prices 'group by p.modified < week'.

What would be the best way to tackle this?

Thanks,


Re: graphing time series data

From
Richard Broersma
Date:
On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand
<vindex+lists-pgsql-sql@apartia.org> wrote:

> Now, I'd like to make a graph of average prices per week, per
> id_product. As some prices don't vary much, distribution would not be
> ideal if I simply 'group by extract(week from p.modified)'.

I created a view for a similar problem that I had.  Only I was
calculating the counts per day.  this query could be crafted to work
for you.

CREATE OR REPLACE VIEW opendiscrepencydailycounts ASWITH opendays(day) AS (        SELECT gs.day::date AS day
FROMgenerate_series((( SELECT
 
min(discrepencylist.discstartdt) AS min                  FROM discrepencylist))::timestamp without time
zone, 'now'::text::date::timestamp without time zone, '1
day'::interval) gs(day)       )SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS
opendiscrepancies  FROM discrepencylist ds, opendays WHERE opendays.day >= ds.discstartdt AND opendays.day <=
LEAST('now'::text::date, ds.resolutiondate) GROUP BY opendays.day, ds.resolvingparty ORDER BY opendays.day,
ds.resolvingparty;


-- 
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug


Re: graphing time series data

From
Louis-David Mitterrand
Date:
On Wed, Apr 14, 2010 at 08:46:13AM -0700, Richard Broersma wrote:
> On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand
> <vindex+lists-pgsql-sql@apartia.org> wrote:
> 
> > Now, I'd like to make a graph of average prices per week, per
> > id_product. As some prices don't vary much, distribution would not be
> > ideal if I simply 'group by extract(week from p.modified)'.
> 
> I created a view for a similar problem that I had.  Only I was
> calculating the counts per day.  this query could be crafted to work
> for you.
> 
> CREATE OR REPLACE VIEW opendiscrepencydailycounts AS
>  WITH opendays(day) AS (
>          SELECT gs.day::date AS day
>            FROM generate_series((( SELECT
> min(discrepencylist.discstartdt) AS min
>                    FROM discrepencylist))::timestamp without time
> zone, 'now'::text::date::timestamp without time zone, '1
> day'::interval) gs(day)
>         )
>  SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS
> opendiscrepancies
>    FROM discrepencylist ds, opendays
>   WHERE opendays.day >= ds.discstartdt AND opendays.day <=
> LEAST('now'::text::date, ds.resolutiondate)
>   GROUP BY opendays.day, ds.resolvingparty
>   ORDER BY opendays.day, ds.resolvingparty;

You confirm my modus operandi. I tried the following which seems to give
me optimal price distribution:

select w.week,count( p.id_price) from     (select generate_series(min(p.created_on),max(p.modified_on),'1     week') as
weekfrom price p) as w join price p on (p.created_on <     w.week + '7 days' and p.modified_on > w.week + '7 days')
groupby     w.week order by w.week
 
         week          | count  
------------------------+--------2010-02-10 15:32:18+01 | 1253692010-02-17 15:32:18+01 | 1268822010-02-24 15:32:18+01 |
1283072010-03-0315:32:18+01 | 1267422010-03-10 15:32:18+01 | 1335962010-03-17 15:32:18+01 | 1490192010-03-24
15:32:18+01| 1499082010-03-31 15:32:18+02 | 147617
 

The rest should be easy from there!

Thanks for your input,


Re: graphing time series data

From
Pavel Stehule
Date:
2010/4/14 Louis-David Mitterrand <vindex+lists-pgsql-sql@apartia.org>:
> On Wed, Apr 14, 2010 at 08:46:13AM -0700, Richard Broersma wrote:
>> On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand
>> <vindex+lists-pgsql-sql@apartia.org> wrote:
>>
>> > Now, I'd like to make a graph of average prices per week, per
>> > id_product. As some prices don't vary much, distribution would not be
>> > ideal if I simply 'group by extract(week from p.modified)'.
>>
>> I created a view for a similar problem that I had.  Only I was
>> calculating the counts per day.  this query could be crafted to work
>> for you.
>>
>> CREATE OR REPLACE VIEW opendiscrepencydailycounts AS
>>  WITH opendays(day) AS (
>>          SELECT gs.day::date AS day
>>            FROM generate_series((( SELECT
>> min(discrepencylist.discstartdt) AS min
>>                    FROM discrepencylist))::timestamp without time
>> zone, 'now'::text::date::timestamp without time zone, '1
>> day'::interval) gs(day)
>>         )
>>  SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS
>> opendiscrepancies
>>    FROM discrepencylist ds, opendays
>>   WHERE opendays.day >= ds.discstartdt AND opendays.day <=
>> LEAST('now'::text::date, ds.resolutiondate)
>>   GROUP BY opendays.day, ds.resolvingparty
>>   ORDER BY opendays.day, ds.resolvingparty;
>
> You confirm my modus operandi. I tried the following which seems to give
> me optimal price distribution:
>
> select w.week,count( p.id_price) from
>                (select generate_series(min(p.created_on),max(p.modified_on),'1
>                week') as week from price p) as w join price p on (p.created_on <
>                w.week + '7 days' and p.modified_on > w.week + '7 days') group by
>                w.week order by w.week
>
>          week          | count
> ------------------------+--------
>  2010-02-10 15:32:18+01 | 125369
>  2010-02-17 15:32:18+01 | 126882
>  2010-02-24 15:32:18+01 | 128307
>  2010-03-03 15:32:18+01 | 126742
>  2010-03-10 15:32:18+01 | 133596
>  2010-03-17 15:32:18+01 | 149019
>  2010-03-24 15:32:18+01 | 149908
>  2010-03-31 15:32:18+02 | 147617
>
> The rest should be easy from there!

I don't understand well. Why you don't use a function date_trunc(),

select date_trunc('week', created), count(*)
from price
group by date_trunc('week', created)

Regards
Pavel Stehuke

>
> Thanks for your input,
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


Re: graphing time series data

From
Louis-David Mitterrand
Date:
On Wed, Apr 14, 2010 at 06:06:59PM +0200, Pavel Stehule wrote:
> I don't understand well. Why you don't use a function date_trunc(),
> 
> select date_trunc('week', created), count(*)
> from price
> group by date_trunc('week', created)

Because if a price doesn't change for more than a week, then some weeks
will have bad statistical distribution (i.e not including prices which
only have their 'modified' updated). So I (think I) need to (1) generate
the weeks separately and (2) average prices that are current for each
week.

But I could be missing something obvious.



Re: graphing time series data

From
Pavel Stehule
Date:
2010/4/14 Louis-David Mitterrand <vindex+lists-pgsql-sql@apartia.org>:
> On Wed, Apr 14, 2010 at 06:06:59PM +0200, Pavel Stehule wrote:
>> I don't understand well. Why you don't use a function date_trunc(),
>>
>> select date_trunc('week', created), count(*)
>> from price
>> group by date_trunc('week', created)
>
> Because if a price doesn't change for more than a week, then some weeks
> will have bad statistical distribution (i.e not including prices which
> only have their 'modified' updated). So I (think I) need to (1) generate
> the weeks separately and (2) average prices that are current for each
> week.
>
> But I could be missing something obvious.

ok
Pavel
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>