Thread: graphing time series data
Hi, I have times series data in a 'price' table: price(id_price, price, id_product, created, modified) Prices are polled daily and a new 'price' row is created only if the price of id_product changes, else modified is updated to now(). Now, I'd like to make a graph of average prices per week, per id_product. As some prices don't vary much, distribution would not be ideal if I simply 'group by extract(week from p.modified)'. Ideally I'd generate_series() a list of weeks between min(p.created) and max(p.modified) and then average prices 'group by p.modified < week'. What would be the best way to tackle this? Thanks,
On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand <vindex+lists-pgsql-sql@apartia.org> wrote: > Now, I'd like to make a graph of average prices per week, per > id_product. As some prices don't vary much, distribution would not be > ideal if I simply 'group by extract(week from p.modified)'. I created a view for a similar problem that I had. Only I was calculating the counts per day. this query could be crafted to work for you. CREATE OR REPLACE VIEW opendiscrepencydailycounts ASWITH opendays(day) AS ( SELECT gs.day::date AS day FROMgenerate_series((( SELECT min(discrepencylist.discstartdt) AS min FROM discrepencylist))::timestamp without time zone, 'now'::text::date::timestamp without time zone, '1 day'::interval) gs(day) )SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS opendiscrepancies FROM discrepencylist ds, opendays WHERE opendays.day >= ds.discstartdt AND opendays.day <= LEAST('now'::text::date, ds.resolutiondate) GROUP BY opendays.day, ds.resolvingparty ORDER BY opendays.day, ds.resolvingparty; -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
On Wed, Apr 14, 2010 at 08:46:13AM -0700, Richard Broersma wrote: > On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand > <vindex+lists-pgsql-sql@apartia.org> wrote: > > > Now, I'd like to make a graph of average prices per week, per > > id_product. As some prices don't vary much, distribution would not be > > ideal if I simply 'group by extract(week from p.modified)'. > > I created a view for a similar problem that I had. Only I was > calculating the counts per day. this query could be crafted to work > for you. > > CREATE OR REPLACE VIEW opendiscrepencydailycounts AS > WITH opendays(day) AS ( > SELECT gs.day::date AS day > FROM generate_series((( SELECT > min(discrepencylist.discstartdt) AS min > FROM discrepencylist))::timestamp without time > zone, 'now'::text::date::timestamp without time zone, '1 > day'::interval) gs(day) > ) > SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS > opendiscrepancies > FROM discrepencylist ds, opendays > WHERE opendays.day >= ds.discstartdt AND opendays.day <= > LEAST('now'::text::date, ds.resolutiondate) > GROUP BY opendays.day, ds.resolvingparty > ORDER BY opendays.day, ds.resolvingparty; You confirm my modus operandi. I tried the following which seems to give me optimal price distribution: select w.week,count( p.id_price) from (select generate_series(min(p.created_on),max(p.modified_on),'1 week') as weekfrom price p) as w join price p on (p.created_on < w.week + '7 days' and p.modified_on > w.week + '7 days') groupby w.week order by w.week week | count ------------------------+--------2010-02-10 15:32:18+01 | 1253692010-02-17 15:32:18+01 | 1268822010-02-24 15:32:18+01 | 1283072010-03-0315:32:18+01 | 1267422010-03-10 15:32:18+01 | 1335962010-03-17 15:32:18+01 | 1490192010-03-24 15:32:18+01| 1499082010-03-31 15:32:18+02 | 147617 The rest should be easy from there! Thanks for your input,
2010/4/14 Louis-David Mitterrand <vindex+lists-pgsql-sql@apartia.org>: > On Wed, Apr 14, 2010 at 08:46:13AM -0700, Richard Broersma wrote: >> On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand >> <vindex+lists-pgsql-sql@apartia.org> wrote: >> >> > Now, I'd like to make a graph of average prices per week, per >> > id_product. As some prices don't vary much, distribution would not be >> > ideal if I simply 'group by extract(week from p.modified)'. >> >> I created a view for a similar problem that I had. Only I was >> calculating the counts per day. this query could be crafted to work >> for you. >> >> CREATE OR REPLACE VIEW opendiscrepencydailycounts AS >> WITH opendays(day) AS ( >> SELECT gs.day::date AS day >> FROM generate_series((( SELECT >> min(discrepencylist.discstartdt) AS min >> FROM discrepencylist))::timestamp without time >> zone, 'now'::text::date::timestamp without time zone, '1 >> day'::interval) gs(day) >> ) >> SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS >> opendiscrepancies >> FROM discrepencylist ds, opendays >> WHERE opendays.day >= ds.discstartdt AND opendays.day <= >> LEAST('now'::text::date, ds.resolutiondate) >> GROUP BY opendays.day, ds.resolvingparty >> ORDER BY opendays.day, ds.resolvingparty; > > You confirm my modus operandi. I tried the following which seems to give > me optimal price distribution: > > select w.week,count( p.id_price) from > (select generate_series(min(p.created_on),max(p.modified_on),'1 > week') as week from price p) as w join price p on (p.created_on < > w.week + '7 days' and p.modified_on > w.week + '7 days') group by > w.week order by w.week > > week | count > ------------------------+-------- > 2010-02-10 15:32:18+01 | 125369 > 2010-02-17 15:32:18+01 | 126882 > 2010-02-24 15:32:18+01 | 128307 > 2010-03-03 15:32:18+01 | 126742 > 2010-03-10 15:32:18+01 | 133596 > 2010-03-17 15:32:18+01 | 149019 > 2010-03-24 15:32:18+01 | 149908 > 2010-03-31 15:32:18+02 | 147617 > > The rest should be easy from there! I don't understand well. Why you don't use a function date_trunc(), select date_trunc('week', created), count(*) from price group by date_trunc('week', created) Regards Pavel Stehuke > > Thanks for your input, > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
On Wed, Apr 14, 2010 at 06:06:59PM +0200, Pavel Stehule wrote: > I don't understand well. Why you don't use a function date_trunc(), > > select date_trunc('week', created), count(*) > from price > group by date_trunc('week', created) Because if a price doesn't change for more than a week, then some weeks will have bad statistical distribution (i.e not including prices which only have their 'modified' updated). So I (think I) need to (1) generate the weeks separately and (2) average prices that are current for each week. But I could be missing something obvious.
2010/4/14 Louis-David Mitterrand <vindex+lists-pgsql-sql@apartia.org>: > On Wed, Apr 14, 2010 at 06:06:59PM +0200, Pavel Stehule wrote: >> I don't understand well. Why you don't use a function date_trunc(), >> >> select date_trunc('week', created), count(*) >> from price >> group by date_trunc('week', created) > > Because if a price doesn't change for more than a week, then some weeks > will have bad statistical distribution (i.e not including prices which > only have their 'modified' updated). So I (think I) need to (1) generate > the weeks separately and (2) average prices that are current for each > week. > > But I could be missing something obvious. ok Pavel > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >