Thread: using window-functions to get freshest value - how?
I have a table
CREATE TABLE rfmitzeit
(
id_rf inet NOT NULL,
id_bf integer,
wert text,
letztespeicherung timestamp without time zone
CONSTRAINT repofeld_id_rf PRIMARY KEY (id_rf),
);
where for one id_bf there are stored mutliple values ("wert") at multiple dates:
id_bf, wert, letztespeicherung:
98, 'blue', 2009-11-09
98, 'red', 2009-11-10
now I have a select to get the "youngest value" for every id_bf:
select
rfmitzeit.id_bf, rfmitzeit.wert
from
rfmitzeit
join
(select
id_bf, max(rfmitzeit.letztespeicherung) as maxsi
from
rfmitzeit
group by id_bf
) idbfzeit on (rfmitzeit.id_bf=idbfzeit.id_bf and rfmitzeit.letztespeicherung=idbfzeit.maxsi)
which works quite fine. But I want to extend my knowledge....
so I have the feeling that this should be possible using a WINDOW-function,
but I can not find out how. (besides curiousity, I also want to test if doing this kind of query with window-functions will be faster)
Is it possible? How would the SQL utilizing WINDOW-functions look like?
Best wishes,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
CREATE TABLE rfmitzeit
(
id_rf inet NOT NULL,
id_bf integer,
wert text,
letztespeicherung timestamp without time zone
CONSTRAINT repofeld_id_rf PRIMARY KEY (id_rf),
);
where for one id_bf there are stored mutliple values ("wert") at multiple dates:
id_bf, wert, letztespeicherung:
98, 'blue', 2009-11-09
98, 'red', 2009-11-10
now I have a select to get the "youngest value" for every id_bf:
select
rfmitzeit.id_bf, rfmitzeit.wert
from
rfmitzeit
join
(select
id_bf, max(rfmitzeit.letztespeicherung) as maxsi
from
rfmitzeit
group by id_bf
) idbfzeit on (rfmitzeit.id_bf=idbfzeit.id_bf and rfmitzeit.letztespeicherung=idbfzeit.maxsi)
which works quite fine. But I want to extend my knowledge....
so I have the feeling that this should be possible using a WINDOW-function,
but I can not find out how. (besides curiousity, I also want to test if doing this kind of query with window-functions will be faster)
Is it possible? How would the SQL utilizing WINDOW-functions look like?
Best wishes,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
Massa, Harald Armin, 20.11.2009 11:07: > I have a table > > CREATE TABLE rfmitzeit > ( > id_rf inet NOT NULL, > id_bf integer, > wert text, > letztespeicherung timestamp without time zone > CONSTRAINT repofeld_id_rf PRIMARY KEY (id_rf), > ); > > > where for one id_bf there are stored mutliple values ("wert") at > multiple dates: > > id_bf, wert, letztespeicherung: > 98, 'blue', 2009-11-09 > 98, 'red', 2009-11-10 > > > now I have a select to get the "youngest value" for every id_bf: Not tested: SELECT id_bf, wert, max(letztespeicherung) over (partition by id_bf) FROM rfmitzeit Regards Thomas
id_bf, wert, letztespeicherung:
no, that does not work:
"id_bf";"wert";"max"
98;"blue";"2009-11-10 00:00:00"
98;"red";"2009-11-10 00:00:00"
result is: I get the date of the youngest value.
My expected result is:
98;"red"
(that is, the entry of "wert" that is youngest)
thanks for trying,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
Not tested:98, 'blue', 2009-11-09
98, 'red', 2009-11-10
now I have a select to get the "youngest value" for every id_bf:
SELECT id_bf, wert,
max(letztespeicherung) over (partition by id_bf)
FROM rfmitzeit
no, that does not work:
"id_bf";"wert";"max"
98;"blue";"2009-11-10 00:00:00"
98;"red";"2009-11-10 00:00:00"
result is: I get the date of the youngest value.
My expected result is:
98;"red"
(that is, the entry of "wert" that is youngest)
thanks for trying,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality
SELECT * FROM (SELECT *,row_number() OVER (ORDER BY letztespeicherung DESC) FROM rfmitzeit) t WHERE row_number=1 On Fri, Nov 20, 2009 at 5:31 AM, Massa, Harald Armin <chef@ghum.de> wrote: > id_bf, wert, letztespeicherung: >>> >>> 98, 'blue', 2009-11-09 >>> 98, 'red', 2009-11-10 >>> now I have a select to get the "youngest value" for every id_bf: >> >> Not tested: >> >> SELECT id_bf, wert, >> max(letztespeicherung) over (partition by id_bf) >> FROM rfmitzeit > > no, that does not work: > "id_bf";"wert";"max" > 98;"blue";"2009-11-10 00:00:00" > 98;"red";"2009-11-10 00:00:00" > > result is: I get the date of the youngest value. > > My expected result is: > > 98;"red" > > (that is, the entry of "wert" that is youngest) > > thanks for trying, > > Harald > > -- > GHUM Harald Massa > persuadere et programmare > Harald Armin Massa > Spielberger Straße 49 > 70435 Stuttgart > 0173/9409607 > no fx, no carrier pigeon > - > %s is too gigantic of an industry to bend to the whims of reality >
oops, I forgot the partition by. Here's the correct query: SELECT * FROM (SELECT *,row_number() OVER (PARTITION BY id_rf ORDER BY letztespeicherung DESC) FROM rfmitzeit) t WHERE row_number=1 You can also do it using SELECT DISTINCT ON: SELECT DISTINCT ON (id_rf) * FROM rfmitzeit ORDER BY id_rf, letztespeicherung DESC; My guess is that the latter will perform better but you should do your own testing. On Fri, Nov 20, 2009 at 5:36 AM, silly8888 <silly8888@gmail.com> wrote: > SELECT * FROM (SELECT *,row_number() OVER (ORDER BY letztespeicherung > DESC) FROM rfmitzeit) t WHERE row_number=1 > > > > On Fri, Nov 20, 2009 at 5:31 AM, Massa, Harald Armin <chef@ghum.de> wrote: >> id_bf, wert, letztespeicherung: >>>> >>>> 98, 'blue', 2009-11-09 >>>> 98, 'red', 2009-11-10 >>>> now I have a select to get the "youngest value" for every id_bf: >>> >>> Not tested: >>> >>> SELECT id_bf, wert, >>> max(letztespeicherung) over (partition by id_bf) >>> FROM rfmitzeit >> >> no, that does not work: >> "id_bf";"wert";"max" >> 98;"blue";"2009-11-10 00:00:00" >> 98;"red";"2009-11-10 00:00:00" >> >> result is: I get the date of the youngest value. >> >> My expected result is: >> >> 98;"red" >> >> (that is, the entry of "wert" that is youngest) >> >> thanks for trying, >> >> Harald >> >> -- >> GHUM Harald Massa >> persuadere et programmare >> Harald Armin Massa >> Spielberger Straße 49 >> 70435 Stuttgart >> 0173/9409607 >> no fx, no carrier pigeon >> - >> %s is too gigantic of an industry to bend to the whims of reality >> >
On Fri, Nov 20, 2009 at 11:07:37AM +0100, Massa, Harald Armin wrote: > Is it possible? How would the SQL utilizing WINDOW-functions look like? there is no point in using window functions in here - simply use "DISTINCT ON". Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Massa, Harald Armin, 20.11.2009 11:31: > no, that does not work: > "id_bf";"wert";"max" > 98;"blue";"2009-11-10 00:00:00" > 98;"red";"2009-11-10 00:00:00" > > result is: I get the date of the youngest value. > > My expected result is: > > 98;"red" > > (that is, the entry of "wert" that is youngest) > Sorry then I misunderstood your requirement I don't think windowing functions are the best choice here. I'd probably use something like: SELECT r1.id_bf, r1.wert FROM rfmitzeit r1 WHERE letztespeicherung = (SELECT max(letztespeicherung) FROM rfmitzeit r2 WHERE r2.id_bf = r1.id_bf) Not really different to your solution, but I think it's easier to read (personal taste!) but it also could be slightly moreefficient. But most probably the optimizer is smart enough... Regards Thomas P.S.: could you please turn off the HTML format for your mails. Especially the SQL statements are very hard to read...
In response to hubert depesz lubaczewski : > On Fri, Nov 20, 2009 at 11:07:37AM +0100, Massa, Harald Armin wrote: > > Is it possible? How would the SQL utilizing WINDOW-functions look like? > > there is no point in using window functions in here - simply use > "DISTINCT ON". Right, but he want to extend his knowledge.... Harald, keep in mind, DISTINCT ON isn't portable. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
> > Is it possible? How would the SQL utilizing WINDOW-functions look like? > > there is no point in using window functions in here - simply use > "DISTINCT ON". and how would I use DISTINCT ON for this query? Please bear in mind, that there is more then one id_bf (just stopped the sample data with one of them) Best wishes, Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 no fx, no carrier pigeon - %s is too gigantic of an industry to bend to the whims of reality
2009/11/20 Massa, Harald Armin <chef@ghum.de>
> > Is it possible? How would the SQL utilizing WINDOW-functions look like?and how would I use DISTINCT ON for this query? Please bear in mind,
>
> there is no point in using window functions in here - simply use
> "DISTINCT ON".
that there is more then one id_bf (just stopped the sample data with
one of them)
Best wishes,
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Would this work?
SELECT DISTINCT ON (id_bf) id_bf, wert, letztespeicherung
FROM rfmitzeit
ORDER BY id_bf, letztespeicherung ASC;
Regards
Thom
In response to Massa, Harald Armin : > > > Is it possible? How would the SQL utilizing WINDOW-functions look like? > > > > there is no point in using window functions in here - simply use > > "DISTINCT ON". > > and how would I use DISTINCT ON for this query? Please bear in mind, > that there is more then one id_bf (just stopped the sample data with > one of them) For my example: test=# select * from harald order by 1,2; g | datum | value ---+------------+------- 1 | 2009-11-01 | 10 1 | 2009-11-02 | 11 1 | 2009-11-03 | 12 1 | 2009-11-05 | 1 2 | 2009-11-04 | 20 2 | 2009-11-10 | 100 (6 rows) the solution with DISTINCT ON: test=*# select distinct on (g) * from harald order by g, datum desc; g | datum | value ---+------------+------- 1 | 2009-11-05 | 1 2 | 2009-11-10 | 100 (2 rows) Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
> and how would I use DISTINCT ON for this query? Please bear in mind, > that there is more then one id_bf (just stopped the sample data with > one of them) I posted the answer more than hour ago: SELECT DISTINCT ON (id_rf) * FROM rfmitzeit ORDER BY id_rf, letztespeicherung DESC; The equivalent with window functions would be: SELECT * FROM (SELECT *,row_number() OVER (PARTITION BY id_rf ORDER BY letztespeicherung DESC) FROM rfmitzeit) t WHERE row_number=1 If you check the query plan you will understand why DISTINCT ON is the best option. Essensially, DISTINCT ON has no additional cost other the cost of ORDER BY id_rf, letztespeicherung DESC which is unavoidable
2009/11/20 silly8888 <silly8888@gmail.com>
Ah yes, I mixed up the id_rf and id_bf. Yeah, this is what I would use (although I personally avoid using *).
Thom
> and how would I use DISTINCT ON for this query? Please bear in mind,I posted the answer more than hour ago:
> that there is more then one id_bf (just stopped the sample data with
> one of them)
SELECT DISTINCT ON (id_rf) * FROM rfmitzeit ORDER BY id_rf,
letztespeicherung DESC;
Ah yes, I mixed up the id_rf and id_bf. Yeah, this is what I would use (although I personally avoid using *).
Thom
Thom, depesz, silly, > SELECT DISTINCT ON (id_bf) id_bf, wert, letztespeicherung > FROM rfmitzeit > ORDER BY id_bf, letztespeicherung ASC; yes, that does work. I put it in the real world query (my example was reduced to the relevant parts), and it provides an impressive speedup (down from 2234 to 1600ms on the worst possible dataset with worst possible disk/cache situation) And I learned that DISTINCT ON (xxx) .... order by zzzz will first do the order by, and then definitely take every "first row" of the ordered result. My expection of DISTINCT on (id_bf) was "give me ONLY ONE of the rows with every id_bf", I was not aware of the fact that it will give me THE FIRST of the 'ordered by' result set. Thanks to all who helped me learn today! Harald (and sorry for leaving HTML on on some of the mails) -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 no fx, no carrier pigeon - %s is too gigantic of an industry to bend to the whims of reality