Thread: window function to sort times series data?
Hi, I have time series data: price(id_price int, price int, created_on timestamp) I'd like to select the latest price before, say, 2010-03-10 and the latest price after that date. Using "group by" and self-joins I was able to build a (quite large :) working query. But I wonder if there is a cleaner, shorter solution with a window function. I tried something like: select * from (select first_value(p.id_price) over w as first_id_price,first_value(p.price) over w as first_price,first_value(p.created_on::date)over w as first_date,nth_value(p.id_price,2) over w as second_id_price,nth_value(p.price,2)over w as second_price,nth_value(p.created_on::date,2) over w as second_date, p.id_pricefromprice p window w as (order by p.created_on > '2010-03-10, p.id_price desc rows between unbounded precedingand unbounded following)) as t where first_id_price=id_price; But this doesn't return correct results. Thanks for any suggestions,
In response to Louis-David Mitterrand : > Hi, > > I have time series data: price(id_price int, price int, created_on timestamp) > > I'd like to select the latest price before, say, 2010-03-10 and the > latest price after that date. test=*# select * from price ;id_price | price | created_on ----------+-------+--------------------- 1 | 10 | 2010-01-01 00:00:00 1 | 12 | 2010-02-01 00:00:00 1 | 8 | 2010-03-01 00:00:00 1 | 15 | 2010-03-10 00:00:00 1 | 13 | 2010-03-20 00:00:00 (5 rows) test=*# select * from ( select distinct on(id_price) id_price, price, created_on from price where created_on < '2010-02-20'::dateorder by id_price, created_on desc ) foo union all select * from ( select distinct on(id_price) id_price, price, created_on from price where created_on > '2010-02-20'::dateorder by id_price, created_on asc ) bar order by id_price,created_on ;id_price | price | created_on ----------+-------+--------------------- 1 | 12 | 2010-02-01 00:00:00 1 | 8 | 2010-03-01 00:00:00 (2 rows) That's okay for you? Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
On Wed, Mar 24, 2010 at 03:29:36PM +0100, A. Kretschmer wrote: > In response to Louis-David Mitterrand : > > Hi, > > > > I have time series data: price(id_price int, price int, created_on timestamp) > > > > I'd like to select the latest price before, say, 2010-03-10 and the > > latest price after that date. > > test=*# select * from price ; > id_price | price | created_on > ----------+-------+--------------------- > 1 | 10 | 2010-01-01 00:00:00 > 1 | 12 | 2010-02-01 00:00:00 > 1 | 8 | 2010-03-01 00:00:00 > 1 | 15 | 2010-03-10 00:00:00 > 1 | 13 | 2010-03-20 00:00:00 > (5 rows) > > test=*# select * from ( > select distinct on(id_price) id_price, price, created_on from price where created_on < '2010-02-20'::date order by id_price,created_on desc > ) foo union all select * from ( > select distinct on(id_price) id_price, price, created_on from price where created_on > '2010-02-20'::date order by id_price,created_on asc > ) bar order by id_price,created_on ; > id_price | price | created_on > ----------+-------+--------------------- > 1 | 12 | 2010-02-01 00:00:00 > 1 | 8 | 2010-03-01 00:00:00 > (2 rows) > > That's okay for you? Yes, that works, but I forgot in my specs (!) that I'd like the two prices (pre and post 2010-03-10) to be returned on the same row and only if a post-2010-03-10 price exists. Thanks,
In response to Louis-David Mitterrand : > On Wed, Mar 24, 2010 at 03:29:36PM +0100, A. Kretschmer wrote: > > In response to Louis-David Mitterrand : > > > Hi, > > > > > > I have time series data: price(id_price int, price int, created_on timestamp) > > > > > > I'd like to select the latest price before, say, 2010-03-10 and the > > > latest price after that date. > > > > test=*# select * from price ; > > id_price | price | created_on > > ----------+-------+--------------------- > > 1 | 10 | 2010-01-01 00:00:00 > > 1 | 12 | 2010-02-01 00:00:00 > > 1 | 8 | 2010-03-01 00:00:00 > > 1 | 15 | 2010-03-10 00:00:00 > > 1 | 13 | 2010-03-20 00:00:00 > > (5 rows) > > > > test=*# select * from ( > > select distinct on(id_price) id_price, price, created_on from price where created_on < '2010-02-20'::date order byid_price, created_on desc > > ) foo union all select * from ( > > select distinct on(id_price) id_price, price, created_on from price where created_on > '2010-02-20'::date order byid_price, created_on asc > > ) bar order by id_price,created_on ; > > id_price | price | created_on > > ----------+-------+--------------------- > > 1 | 12 | 2010-02-01 00:00:00 > > 1 | 8 | 2010-03-01 00:00:00 > > (2 rows) > > > > That's okay for you? > > Yes, that works, but I forgot in my specs (!) that I'd like the two > prices (pre and post 2010-03-10) to be returned on the same row and only > if a post-2010-03-10 price exists. Well: test=*# select * from price ;id_price | price | created_on ----------+-------+--------------------- 1 | 10 | 2010-01-01 00:00:00 1 | 12 | 2010-02-01 00:00:00 1 | 8 | 2010-03-01 00:00:00 1 | 15 | 2010-03-10 00:00:00 1 | 13 | 2010-03-20 00:00:00 (5 rows) test=*# select id_price, sum(case when pre_post='pre' then price else 0 end) as pre, sum(case when pre_post='post' then price else 0 end) as post, max(case when pre_post='pre' then created_on else null end) as date_pre, max(case when pre_post='post' then created_on else null end) as date_post from (select * from (select distinct on(id_price) 'pre'::text as pre_post, id_price, price, created_on from price where created_on < '2010-02-20'::date order by id_price, created_on desc) foo union all select * from (select distinct on(id_price) 'post'::text, id_price, price, created_on from price where created_on > '2010-02-20'::date order by id_price,created_on asc) bar order by id_price, created_on) foo group by id_price;id_price | pre | post | date_pre | date_post ----------+-----+------+---------------------+--------------------- 1 | 12 | 8 | 2010-02-01 00:00:00 | 2010-03-0100:00:00 (1 row) You can filter that result for rows where date_post is not null. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
A. Kretschmer <andreas.kretschmer@schollglas.com> wrote: Well, and now i'm using 8.4 windowing-functions: test=*# select * from price order by price_id, d;price_id | price | d ----------+-------+------------ 1 | 10 | 2010-03-12 1 | 11 | 2010-03-19 1 | 12 | 2010-03-26 1 | 13 | 2010-04-02 1 | 14 | 2010-04-09 1 | 15 | 2010-04-16 1 | 16 | 2010-04-23 1| 17 | 2010-04-30 2 | 20 | 2010-03-12 2 | 21 | 2010-03-19 2 | 22 | 2010-03-26 2 | 23 | 2010-04-02 (12 Zeilen) -- now i'm searching for 2010-03-20: Zeit: 0,319 ms test=*# select price_id, sum(case when d < '2010-03-20'::date then price else 0 end) as price_old, sum(case when d > '2010-03-20'::date then price else 0 end) as price_new, max(case when d < '2010-03-20'::date then d else null end) as date_old, max(case when d > '2010-03-20'::date then d else null end) as date_new from (select price_id, price, d, lag(d) over(partition by price_id order by d), lead(d) over(partition by price_id order by d) from price) foo where '2010-03-20'::date between lag and lead group by price_id;price_id | price_old | price_new | date_old | date_new ----------+-----------+-----------+------------+------------ 1 | 11 | 12 | 2010-03-19 | 2010-03-26 2 | 21 | 22 | 2010-03-19 | 2010-03-26 (2 Zeilen) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Wed, Mar 24, 2010 at 05:29:46PM +0100, Andreas Kretschmer wrote: > A. Kretschmer <andreas.kretschmer@schollglas.com> wrote: > > Well, and now i'm using 8.4 windowing-functions: > > test=*# select * from price order by price_id, d; > price_id | price | d > ----------+-------+------------ > 1 | 10 | 2010-03-12 > 1 | 11 | 2010-03-19 > 1 | 12 | 2010-03-26 > 1 | 13 | 2010-04-02 > 1 | 14 | 2010-04-09 > 1 | 15 | 2010-04-16 > 1 | 16 | 2010-04-23 > 1 | 17 | 2010-04-30 > 2 | 20 | 2010-03-12 > 2 | 21 | 2010-03-19 > 2 | 22 | 2010-03-26 > 2 | 23 | 2010-04-02 > (12 Zeilen) > > -- now i'm searching for 2010-03-20: > > Zeit: 0,319 ms > test=*# select price_id, sum(case when d < '2010-03-20'::date then price > else 0 end) as price_old, sum(case when d > '2010-03-20'::date then > price else 0 end) as price_new, max(case when d < '2010-03-20'::date > then d else null end) as date_old, max(case when d > '2010-03-20'::date > then d else null end) as date_new from (select price_id, price, d, > lag(d) over(partition by price_id order by d), lead(d) over(partition by > price_id order by d) from price) foo where '2010-03-20'::date between > lag and lead group by price_id; > price_id | price_old | price_new | date_old | date_new > ----------+-----------+-----------+------------+------------ > 1 | 11 | 12 | 2010-03-19 | 2010-03-26 > 2 | 21 | 22 | 2010-03-19 | 2010-03-26 > (2 Zeilen) Nice use of lag() and lead() functions. In my db id_price is a serial so it's easy to use in an aggregate to determine the latest. I also looked at window functions and did the following: select p3.price as first_price, p4.price as second_price from (select first_value(max(p.id_price)) over was first_id_price, nth_value(max(p.id_price),2) over w as second_id_price, p.created_on > '2010-03-20'as is_new_price from price p group by p.created_on > '2010-03-20' window w as (orderby p.created_on > '2010-03-20' desc rows between unbounded preceding and unbounded following) ) as t join price p3 on (t.first_id_price=p3.id_price) left join price p4 on (t.second_id_price=p4.id_price) where t.is_new_priceis true test=# \efirst_price | second_price -------------+-------------- 17 | 11 (1 row) Is there some potential optimizations or flaws? Here is the test database: -- -- PostgreSQL database dump -- SET statement_timeout = 0; SET client_encoding = 'SQL_ASCII'; SET standard_conforming_strings = off; SET check_function_bodies = false; SET client_min_messages = warning; SET escape_string_warning = off; SET search_path = public, pg_catalog; SET default_tablespace = ''; SET default_with_oids = false; -- -- Name: price; Type: TABLE; Schema: public; Owner: ldm; Tablespace: -- CREATE TABLE price ( id_price integer NOT NULL, price integer, created_on timestamp without time zone ); ALTER TABLE public.price OWNER TO ldm; -- -- Name: price_id_price_seq; Type: SEQUENCE; Schema: public; Owner: ldm -- CREATE SEQUENCE price_id_price_seq START WITH 1 INCREMENT BY 1 NO MAXVALUE NO MINVALUE CACHE 1; ALTER TABLE public.price_id_price_seq OWNER TO ldm; -- -- Name: price_id_price_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: ldm -- ALTER SEQUENCE price_id_price_seq OWNED BY price.id_price; -- -- Name: price_id_price_seq; Type: SEQUENCE SET; Schema: public; Owner: ldm -- SELECT pg_catalog.setval('price_id_price_seq', 8, true); -- -- Name: id_price; Type: DEFAULT; Schema: public; Owner: ldm -- ALTER TABLE price ALTER COLUMN id_price SET DEFAULT nextval('price_id_price_seq'::regclass); -- -- Data for Name: price; Type: TABLE DATA; Schema: public; Owner: ldm -- COPY price (id_price, price, created_on) FROM stdin; 1 10 2010-03-12 00:00:00 2 11 2010-03-19 00:00:00 3 12 2010-03-26 00:00:00 4 13 2010-04-02 00:00:00 5 14 2010-04-09 00:00:00 6 15 2010-04-16 00:00:00 7 16 2010-04-23 00:00:00 8 17 2010-04-30 00:00:00 \. -- -- Name: public; Type: ACL; Schema: -; Owner: postgres -- REVOKE ALL ON SCHEMA public FROM PUBLIC; REVOKE ALL ON SCHEMA public FROM postgres; GRANT ALL ON SCHEMA public TO postgres; GRANT ALL ON SCHEMA public TO PUBLIC; -- -- PostgreSQL database dump complete --
In going through the arcana of string functions, I have come across the following series of selects that contain, for me,a mysterious "$re$".<br /><br />-- return all matches from regexp<br />SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);<br/><br />-- test case insensitive<br />SELECT regexp_matches('foObARbEqUEbAz', $re$(bar)(beque)$re$,'i');<br /><br />-- global option - more than one match<br />SELECT regexp_matches('foobarbequebazilbarfbonk',$re$(b[^b]+)(b[^b]+)$re$, 'g');<br /><br />When I run this I get:<br /><br />regexp_matches<br/>--------------------------<br /><Unknown(2,003)><br /><br />I have not been able to find out whatit all means. Forgive me for my blindness.<br /><br />John<br /><br />P.S. The author of the statements is "alexk" atCommand Prompt. They are test statements against Postgres' string functions.<br />
On Wed, Mar 24, 2010 at 2:38 PM, John Gage <jsmgage@gmail.com> wrote: > In going through the arcana of string functions, I have come across the > following series of selects that contain, for me, a mysterious "$re$". > > -- return all matches from regexp > SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$); The $re$ is just an arbitrary identifier for a dollar-quoted string constant. See: http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html Maybe you're on an old version which doesn't support dollar-quoted strings? On 8.3 for the above query, I get: SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);regexp_matches ----------------{bar,beque} (1 row) Josh
Thanks very much for this. I am using 8.4.2. This query works as you describe in pgAdmin. I had tried it in the SquirrelSQL client, which is where it produced the <Unknown(2,003)> result. Obviously, this is a SquirrelSQL problem at least in part. What does <Unknown(2,003)> mean? Thanks again for directing me to the doc's on dollar quoting. John On Mar 24, 2010, at 8:13 PM, Josh Kupershmidt wrote: > On Wed, Mar 24, 2010 at 2:38 PM, John Gage <jsmgage@gmail.com> wrote: >> In going through the arcana of string functions, I have come across >> the >> following series of selects that contain, for me, a mysterious "$re >> $". >> >> -- return all matches from regexp >> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$); > > The $re$ is just an arbitrary identifier for a dollar-quoted string > constant. See: > > http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html > > Maybe you're on an old version which doesn't support dollar-quoted > strings? > > On 8.3 for the above query, I get: > > SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$); > regexp_matches > ---------------- > {bar,beque} > (1 row) > > Josh
Hi John, The type code 2003 indicates an SQL99 column type of ARRAY, which is not currently supported for rendering/editing. Here is the feature request for SQuirreL SQL Client to track this feature: https://sourceforge.net/tracker/?func=detail&aid=2972937&group_id=28383&atid=393417 Rob On Wed, Mar 24, 2010 at 4:38 PM, John Gage <jsmgage@numericable.fr> wrote: > Thanks very much for this. I am using 8.4.2. > > This query works as you describe in pgAdmin. > > I had tried it in the SquirrelSQL client, which is where it produced the > > <Unknown(2,003)> > > result. > > Obviously, this is a SquirrelSQL problem at least in part. > > What does <Unknown(2,003)> mean? > > Thanks again for directing me to the doc's on dollar quoting. > > John > > > On Mar 24, 2010, at 8:13 PM, Josh Kupershmidt wrote: > >> On Wed, Mar 24, 2010 at 2:38 PM, John Gage <jsmgage@gmail.com> wrote: >>> >>> In going through the arcana of string functions, I have come across the >>> following series of selects that contain, for me, a mysterious "$re$". >>> >>> -- return all matches from regexp >>> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$); >> >> The $re$ is just an arbitrary identifier for a dollar-quoted string >> constant. See: >> >> http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html >> >> Maybe you're on an old version which doesn't support dollar-quoted >> strings? >> >> On 8.3 for the above query, I get: >> >> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$); >> regexp_matches >> ---------------- >> {bar,beque} >> (1 row) >> >> Josh > >