Thread: window function to sort times series data?

window function to sort times series data?

From
Louis-David Mitterrand
Date:
Hi,

I have time series data: price(id_price int, price int, created_on timestamp)

I'd like to select the latest price before, say, 2010-03-10 and the
latest price after that date.

Using "group by" and self-joins I was able to build a (quite large :)
working query. 

But I wonder if there is a cleaner, shorter solution with a window
function.

I tried something like:

select * from (select first_value(p.id_price) over w as first_id_price,first_value(p.price) over w as
first_price,first_value(p.created_on::date)over w as first_date,nth_value(p.id_price,2) over w as
second_id_price,nth_value(p.price,2)over w as second_price,nth_value(p.created_on::date,2) over w as second_date,
p.id_pricefromprice p   window w as (order by p.created_on > '2010-03-10, p.id_price desc   rows between unbounded
precedingand unbounded following))    as t where first_id_price=id_price;
 

But this doesn't return correct results.

Thanks for any suggestions,


Re: window function to sort times series data?

From
"A. Kretschmer"
Date:
In response to Louis-David Mitterrand :
> Hi,
> 
> I have time series data: price(id_price int, price int, created_on timestamp)
> 
> I'd like to select the latest price before, say, 2010-03-10 and the
> latest price after that date.

test=*# select * from price ;id_price | price |     created_on
----------+-------+---------------------       1 |    10 | 2010-01-01 00:00:00       1 |    12 | 2010-02-01 00:00:00
  1 |     8 | 2010-03-01 00:00:00       1 |    15 | 2010-03-10 00:00:00       1 |    13 | 2010-03-20 00:00:00
 
(5 rows)

test=*# select * from ( select distinct on(id_price) id_price, price, created_on from price where created_on <
'2010-02-20'::dateorder by id_price, created_on desc
 
) foo union all select * from ( select distinct on(id_price) id_price, price, created_on from price where created_on >
'2010-02-20'::dateorder by id_price, created_on asc
 
) bar order by id_price,created_on ;id_price | price |     created_on
----------+-------+---------------------       1 |    12 | 2010-02-01 00:00:00       1 |     8 | 2010-03-01 00:00:00
(2 rows)


That's okay for you?


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99


Re: window function to sort times series data?

From
Louis-David Mitterrand
Date:
On Wed, Mar 24, 2010 at 03:29:36PM +0100, A. Kretschmer wrote:
> In response to Louis-David Mitterrand :
> > Hi,
> > 
> > I have time series data: price(id_price int, price int, created_on timestamp)
> > 
> > I'd like to select the latest price before, say, 2010-03-10 and the
> > latest price after that date.
> 
> test=*# select * from price ;
>  id_price | price |     created_on
> ----------+-------+---------------------
>         1 |    10 | 2010-01-01 00:00:00
>         1 |    12 | 2010-02-01 00:00:00
>         1 |     8 | 2010-03-01 00:00:00
>         1 |    15 | 2010-03-10 00:00:00
>         1 |    13 | 2010-03-20 00:00:00
> (5 rows)
> 
> test=*# select * from (
>   select distinct on(id_price) id_price, price, created_on from price where created_on < '2010-02-20'::date order by
id_price,created_on desc
 
> ) foo union all select * from (
>   select distinct on(id_price) id_price, price, created_on from price where created_on > '2010-02-20'::date order by
id_price,created_on asc
 
> ) bar order by id_price,created_on ;
>  id_price | price |     created_on
> ----------+-------+---------------------
>         1 |    12 | 2010-02-01 00:00:00
>         1 |     8 | 2010-03-01 00:00:00
> (2 rows)
> 
> That's okay for you?

Yes, that works, but I forgot in my specs (!) that I'd like the two
prices (pre and post 2010-03-10) to be returned on the same row and only
if a post-2010-03-10 price exists.

Thanks,


Re: window function to sort times series data?

From
"A. Kretschmer"
Date:
In response to Louis-David Mitterrand :
> On Wed, Mar 24, 2010 at 03:29:36PM +0100, A. Kretschmer wrote:
> > In response to Louis-David Mitterrand :
> > > Hi,
> > > 
> > > I have time series data: price(id_price int, price int, created_on timestamp)
> > > 
> > > I'd like to select the latest price before, say, 2010-03-10 and the
> > > latest price after that date.
> > 
> > test=*# select * from price ;
> >  id_price | price |     created_on
> > ----------+-------+---------------------
> >         1 |    10 | 2010-01-01 00:00:00
> >         1 |    12 | 2010-02-01 00:00:00
> >         1 |     8 | 2010-03-01 00:00:00
> >         1 |    15 | 2010-03-10 00:00:00
> >         1 |    13 | 2010-03-20 00:00:00
> > (5 rows)
> > 
> > test=*# select * from (
> >   select distinct on(id_price) id_price, price, created_on from price where created_on < '2010-02-20'::date order
byid_price, created_on desc
 
> > ) foo union all select * from (
> >   select distinct on(id_price) id_price, price, created_on from price where created_on > '2010-02-20'::date order
byid_price, created_on asc
 
> > ) bar order by id_price,created_on ;
> >  id_price | price |     created_on
> > ----------+-------+---------------------
> >         1 |    12 | 2010-02-01 00:00:00
> >         1 |     8 | 2010-03-01 00:00:00
> > (2 rows)
> > 
> > That's okay for you?
> 
> Yes, that works, but I forgot in my specs (!) that I'd like the two
> prices (pre and post 2010-03-10) to be returned on the same row and only
> if a post-2010-03-10 price exists.

Well:

test=*# select * from price ;id_price | price |     created_on
----------+-------+---------------------       1 |    10 | 2010-01-01 00:00:00       1 |    12 | 2010-02-01 00:00:00
  1 |     8 | 2010-03-01 00:00:00       1 |    15 | 2010-03-10 00:00:00       1 |    13 | 2010-03-20 00:00:00
 
(5 rows)

test=*# select id_price, sum(case when pre_post='pre' then price else 0
end) as pre, sum(case when pre_post='post' then price else 0 end) as
post, max(case when pre_post='pre' then created_on else null end) as
date_pre, max(case when pre_post='post' then created_on else null end)
as date_post from (select * from (select distinct on(id_price)
'pre'::text as pre_post, id_price, price, created_on from price where
created_on < '2010-02-20'::date order by id_price, created_on desc) foo
union all select * from (select distinct on(id_price) 'post'::text,
id_price, price, created_on from price where created_on >
'2010-02-20'::date order by id_price,created_on asc) bar order by
id_price, created_on) foo group by id_price;id_price | pre | post |      date_pre       |      date_post
----------+-----+------+---------------------+---------------------       1 |  12 |    8 | 2010-02-01 00:00:00 |
2010-03-0100:00:00
 
(1 row)


You can filter that result for rows where date_post is not null.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99


Re: window function to sort times series data?

From
Andreas Kretschmer
Date:
A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:

Well, and now i'm using 8.4 windowing-functions:

test=*# select * from price order by price_id, d;price_id | price |     d
----------+-------+------------       1 |    10 | 2010-03-12       1 |    11 | 2010-03-19       1 |    12 | 2010-03-26
    1 |    13 | 2010-04-02       1 |    14 | 2010-04-09       1 |    15 | 2010-04-16       1 |    16 | 2010-04-23
1|    17 | 2010-04-30       2 |    20 | 2010-03-12       2 |    21 | 2010-03-19       2 |    22 | 2010-03-26       2 |
 23 | 2010-04-02
 
(12 Zeilen)

-- now i'm searching for 2010-03-20:

Zeit: 0,319 ms
test=*# select price_id, sum(case when d < '2010-03-20'::date then price
else 0 end) as price_old, sum(case when d > '2010-03-20'::date then
price else 0 end) as price_new, max(case when d < '2010-03-20'::date
then d else null end) as date_old, max(case when d > '2010-03-20'::date
then d else null end) as date_new from (select price_id, price, d,
lag(d) over(partition by price_id order by d), lead(d) over(partition by
price_id order by d) from price) foo where '2010-03-20'::date between
lag and lead group by price_id;price_id | price_old | price_new |  date_old  |  date_new
----------+-----------+-----------+------------+------------       1 |        11 |        12 | 2010-03-19 | 2010-03-26
    2 |        21 |        22 | 2010-03-19 | 2010-03-26
 
(2 Zeilen)



Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: window function to sort times series data?

From
Louis-David Mitterrand
Date:
On Wed, Mar 24, 2010 at 05:29:46PM +0100, Andreas Kretschmer wrote:
> A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:
> 
> Well, and now i'm using 8.4 windowing-functions:
> 
> test=*# select * from price order by price_id, d;
>  price_id | price |     d
> ----------+-------+------------
>         1 |    10 | 2010-03-12
>         1 |    11 | 2010-03-19
>         1 |    12 | 2010-03-26
>         1 |    13 | 2010-04-02
>         1 |    14 | 2010-04-09
>         1 |    15 | 2010-04-16
>         1 |    16 | 2010-04-23
>         1 |    17 | 2010-04-30
>         2 |    20 | 2010-03-12
>         2 |    21 | 2010-03-19
>         2 |    22 | 2010-03-26
>         2 |    23 | 2010-04-02
> (12 Zeilen)
> 
> -- now i'm searching for 2010-03-20:
> 
> Zeit: 0,319 ms
> test=*# select price_id, sum(case when d < '2010-03-20'::date then price
> else 0 end) as price_old, sum(case when d > '2010-03-20'::date then
> price else 0 end) as price_new, max(case when d < '2010-03-20'::date
> then d else null end) as date_old, max(case when d > '2010-03-20'::date
> then d else null end) as date_new from (select price_id, price, d,
> lag(d) over(partition by price_id order by d), lead(d) over(partition by
> price_id order by d) from price) foo where '2010-03-20'::date between
> lag and lead group by price_id;
>  price_id | price_old | price_new |  date_old  |  date_new
> ----------+-----------+-----------+------------+------------
>         1 |        11 |        12 | 2010-03-19 | 2010-03-26
>         2 |        21 |        22 | 2010-03-19 | 2010-03-26
> (2 Zeilen)

Nice use of lag() and lead() functions. 

In my db id_price is a serial so it's easy to use in an aggregate to
determine the latest.

I also looked at window functions and did the following:

select p3.price as first_price,   p4.price as second_price   from (select            first_value(max(p.id_price)) over
was first_id_price,            nth_value(max(p.id_price),2) over w as second_id_price,            p.created_on >
'2010-03-20'as is_new_price           from price p           group by p.created_on > '2010-03-20'           window w as
(orderby p.created_on > '2010-03-20'           desc rows between unbounded preceding and unbounded following)    ) as t
  join price p3 on (t.first_id_price=p3.id_price)    left join price p4 on (t.second_id_price=p4.id_price)    where
t.is_new_priceis true
 

test=# \efirst_price | second_price 
-------------+--------------         17 |           11
(1 row)

Is there some potential optimizations or flaws?




Here is the test database:

--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'SQL_ASCII';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: price; Type: TABLE; Schema: public; Owner: ldm; Tablespace: 
--

CREATE TABLE price (   id_price integer NOT NULL,   price integer,   created_on timestamp without time zone
);


ALTER TABLE public.price OWNER TO ldm;

--
-- Name: price_id_price_seq; Type: SEQUENCE; Schema: public; Owner: ldm
--

CREATE SEQUENCE price_id_price_seq   START WITH 1   INCREMENT BY 1   NO MAXVALUE   NO MINVALUE   CACHE 1;


ALTER TABLE public.price_id_price_seq OWNER TO ldm;

--
-- Name: price_id_price_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: ldm
--

ALTER SEQUENCE price_id_price_seq OWNED BY price.id_price;


--
-- Name: price_id_price_seq; Type: SEQUENCE SET; Schema: public; Owner: ldm
--

SELECT pg_catalog.setval('price_id_price_seq', 8, true);


--
-- Name: id_price; Type: DEFAULT; Schema: public; Owner: ldm
--

ALTER TABLE price ALTER COLUMN id_price SET DEFAULT nextval('price_id_price_seq'::regclass);


--
-- Data for Name: price; Type: TABLE DATA; Schema: public; Owner: ldm
--

COPY price (id_price, price, created_on) FROM stdin;
1    10    2010-03-12 00:00:00
2    11    2010-03-19 00:00:00
3    12    2010-03-26 00:00:00
4    13    2010-04-02 00:00:00
5    14    2010-04-09 00:00:00
6    15    2010-04-16 00:00:00
7    16    2010-04-23 00:00:00
8    17    2010-04-30 00:00:00
\.


--
-- Name: public; Type: ACL; Schema: -; Owner: postgres
--

REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO PUBLIC;


--
-- PostgreSQL database dump complete
--



Re: window function to sort times series data?

From
John Gage
Date:
In going through the arcana of string functions, I have come across the following series of selects that contain, for
me,a mysterious "$re$".<br /><br />-- return all matches from regexp<br />SELECT regexp_matches('foobarbequebaz',
$re$(bar)(beque)$re$);<br/><br />-- test case insensitive<br />SELECT regexp_matches('foObARbEqUEbAz',
$re$(bar)(beque)$re$,'i');<br /><br />-- global option - more than one match<br />SELECT
regexp_matches('foobarbequebazilbarfbonk',$re$(b[^b]+)(b[^b]+)$re$, 'g');<br /><br />When I run this I get:<br /><br
/>regexp_matches<br/>--------------------------<br /><Unknown(2,003)><br /><br />I have not been able to find out
whatit all means.  Forgive me for my blindness.<br /><br />John<br /><br />P.S. The author of the statements is "alexk"
atCommand Prompt.  They are test statements against Postgres' string functions.<br /> 

Re: window function to sort times series data?

From
Josh Kupershmidt
Date:
On Wed, Mar 24, 2010 at 2:38 PM, John Gage <jsmgage@gmail.com> wrote:
> In going through the arcana of string functions, I have come across the
> following series of selects that contain, for me, a mysterious "$re$".
>
> -- return all matches from regexp
> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);

The $re$ is just an arbitrary identifier for a dollar-quoted string
constant. See:

http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html

Maybe you're on an old version which doesn't support dollar-quoted strings?

On 8.3 for the above query, I get:

SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);regexp_matches
----------------{bar,beque}
(1 row)

Josh


Dollar quoted strings

From
John Gage
Date:
Thanks very much for this.  I am using 8.4.2.

This query works as you describe in pgAdmin.

I had tried it in the SquirrelSQL client, which is where it produced the

<Unknown(2,003)>

result.

Obviously, this is a SquirrelSQL problem at least in part.

What does <Unknown(2,003)> mean?

Thanks again for directing me to the doc's on dollar quoting.

John


On Mar 24, 2010, at 8:13 PM, Josh Kupershmidt wrote:

> On Wed, Mar 24, 2010 at 2:38 PM, John Gage <jsmgage@gmail.com> wrote:
>> In going through the arcana of string functions, I have come across  
>> the
>> following series of selects that contain, for me, a mysterious "$re 
>> $".
>>
>> -- return all matches from regexp
>> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);
>
> The $re$ is just an arbitrary identifier for a dollar-quoted string
> constant. See:
>
> http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html
>
> Maybe you're on an old version which doesn't support dollar-quoted  
> strings?
>
> On 8.3 for the above query, I get:
>
> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);
> regexp_matches
> ----------------
> {bar,beque}
> (1 row)
>
> Josh



Re: Dollar quoted strings

From
Robert Manning
Date:
Hi John,

The type code 2003 indicates an SQL99 column type of ARRAY, which is
not currently supported for rendering/editing.  Here is the feature
request for SQuirreL SQL Client to track this feature:

https://sourceforge.net/tracker/?func=detail&aid=2972937&group_id=28383&atid=393417

Rob

On Wed, Mar 24, 2010 at 4:38 PM, John Gage <jsmgage@numericable.fr> wrote:
> Thanks very much for this.  I am using 8.4.2.
>
> This query works as you describe in pgAdmin.
>
> I had tried it in the SquirrelSQL client, which is where it produced the
>
> <Unknown(2,003)>
>
> result.
>
> Obviously, this is a SquirrelSQL problem at least in part.
>
> What does <Unknown(2,003)> mean?
>
> Thanks again for directing me to the doc's on dollar quoting.
>
> John
>
>
> On Mar 24, 2010, at 8:13 PM, Josh Kupershmidt wrote:
>
>> On Wed, Mar 24, 2010 at 2:38 PM, John Gage <jsmgage@gmail.com> wrote:
>>>
>>> In going through the arcana of string functions, I have come across the
>>> following series of selects that contain, for me, a mysterious "$re$".
>>>
>>> -- return all matches from regexp
>>> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);
>>
>> The $re$ is just an arbitrary identifier for a dollar-quoted string
>> constant. See:
>>
>> http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html
>>
>> Maybe you're on an old version which doesn't support dollar-quoted
>> strings?
>>
>> On 8.3 for the above query, I get:
>>
>> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);
>> regexp_matches
>> ----------------
>> {bar,beque}
>> (1 row)
>>
>> Josh
>
>