Thread: using window-functions to get freshest value - how?

using window-functions to get freshest value - how?

From
"Massa, Harald Armin"
Date:
I have a table

CREATE TABLE rfmitzeit
(
 id_rf inet NOT NULL,
 id_bf integer,
 wert text,
 letztespeicherung timestamp without time zone
 CONSTRAINT repofeld_id_rf PRIMARY KEY (id_rf),
 );
 
 
where for one id_bf there are stored mutliple values ("wert") at multiple dates:
   
   id_bf,  wert, letztespeicherung:
   98,  'blue', 2009-11-09
   98,  'red', 2009-11-10
   
   
now I have a select to get the "youngest value" for every id_bf:
   
select
rfmitzeit.id_bf, rfmitzeit.wert
   from
rfmitzeit
join
(select
   id_bf, max(rfmitzeit.letztespeicherung) as maxsi
from
   rfmitzeit
group by id_bf
) idbfzeit on (rfmitzeit.id_bf=idbfzeit.id_bf and rfmitzeit.letztespeicherung=idbfzeit.maxsi)  

which works quite fine. But I want to extend my knowledge....

so I have the feeling that this should be possible using a WINDOW-function,
but I can not find out how. (besides curiousity, I also want to test if doing this kind of query with window-functions will be faster)

Is it possible? How would the SQL utilizing WINDOW-functions look like?

Best wishes,

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality

Re: using window-functions to get freshest value - how?

From
Thomas Kellerer
Date:
Massa, Harald Armin, 20.11.2009 11:07:
> I have a table
>
> CREATE TABLE rfmitzeit
> (
>  id_rf inet NOT NULL,
>  id_bf integer,
>  wert text,
>  letztespeicherung timestamp without time zone
>  CONSTRAINT repofeld_id_rf PRIMARY KEY (id_rf),
>  );
>
>
> where for one id_bf there are stored mutliple values ("wert") at
> multiple dates:
>
>    id_bf,  wert, letztespeicherung:
>    98,  'blue', 2009-11-09
>    98,  'red', 2009-11-10
>
>
> now I have a select to get the "youngest value" for every id_bf:

Not tested:

SELECT id_bf,
       wert,
       max(letztespeicherung) over (partition by id_bf)
FROM rfmitzeit

Regards
Thomas

Re: using window-functions to get freshest value - how?

From
"Massa, Harald Armin"
Date:
    id_bf,  wert, letztespeicherung:
  98,  'blue', 2009-11-09
  98,  'red', 2009-11-10
    now I have a select to get the "youngest value" for every id_bf:

Not tested:

SELECT id_bf,      wert,
     max(letztespeicherung) over (partition by id_bf)
FROM rfmitzeit

no, that does not work:
"id_bf";"wert";"max"
98;"blue";"2009-11-10 00:00:00"
98;"red";"2009-11-10 00:00:00"

result is: I get the date of the youngest value.

My expected result is:

98;"red"

(that is, the entry of "wert" that is youngest)

thanks for trying,

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality

Re: using window-functions to get freshest value - how?

From
silly8888
Date:
SELECT * FROM (SELECT *,row_number() OVER (ORDER BY letztespeicherung
DESC) FROM rfmitzeit) t WHERE row_number=1



On Fri, Nov 20, 2009 at 5:31 AM, Massa, Harald Armin <chef@ghum.de> wrote:
>     id_bf,  wert, letztespeicherung:
>>>
>>>   98,  'blue', 2009-11-09
>>>   98,  'red', 2009-11-10
>>>     now I have a select to get the "youngest value" for every id_bf:
>>
>> Not tested:
>>
>> SELECT id_bf,      wert,
>>      max(letztespeicherung) over (partition by id_bf)
>> FROM rfmitzeit
>
> no, that does not work:
> "id_bf";"wert";"max"
> 98;"blue";"2009-11-10 00:00:00"
> 98;"red";"2009-11-10 00:00:00"
>
> result is: I get the date of the youngest value.
>
> My expected result is:
>
> 98;"red"
>
> (that is, the entry of "wert" that is youngest)
>
> thanks for trying,
>
> Harald
>
> --
> GHUM Harald Massa
> persuadere et programmare
> Harald Armin Massa
> Spielberger Straße 49
> 70435 Stuttgart
> 0173/9409607
> no fx, no carrier pigeon
> -
> %s is too gigantic of an industry to bend to the whims of reality
>

Re: using window-functions to get freshest value - how?

From
silly8888
Date:
oops, I forgot the partition by. Here's the correct query:

SELECT * FROM (SELECT *,row_number() OVER (PARTITION BY id_rf ORDER BY
letztespeicherung DESC) FROM rfmitzeit) t WHERE row_number=1

You can also do it using SELECT DISTINCT ON:

SELECT DISTINCT ON (id_rf) * FROM rfmitzeit ORDER BY id_rf,
letztespeicherung DESC;

My guess is that the latter will perform better but you should do your
own testing.


On Fri, Nov 20, 2009 at 5:36 AM, silly8888 <silly8888@gmail.com> wrote:
> SELECT * FROM (SELECT *,row_number() OVER (ORDER BY letztespeicherung
> DESC) FROM rfmitzeit) t WHERE row_number=1
>
>
>
> On Fri, Nov 20, 2009 at 5:31 AM, Massa, Harald Armin <chef@ghum.de> wrote:
>>     id_bf,  wert, letztespeicherung:
>>>>
>>>>   98,  'blue', 2009-11-09
>>>>   98,  'red', 2009-11-10
>>>>     now I have a select to get the "youngest value" for every id_bf:
>>>
>>> Not tested:
>>>
>>> SELECT id_bf,      wert,
>>>      max(letztespeicherung) over (partition by id_bf)
>>> FROM rfmitzeit
>>
>> no, that does not work:
>> "id_bf";"wert";"max"
>> 98;"blue";"2009-11-10 00:00:00"
>> 98;"red";"2009-11-10 00:00:00"
>>
>> result is: I get the date of the youngest value.
>>
>> My expected result is:
>>
>> 98;"red"
>>
>> (that is, the entry of "wert" that is youngest)
>>
>> thanks for trying,
>>
>> Harald
>>
>> --
>> GHUM Harald Massa
>> persuadere et programmare
>> Harald Armin Massa
>> Spielberger Straße 49
>> 70435 Stuttgart
>> 0173/9409607
>> no fx, no carrier pigeon
>> -
>> %s is too gigantic of an industry to bend to the whims of reality
>>
>

Re: using window-functions to get freshest value - how?

From
hubert depesz lubaczewski
Date:
On Fri, Nov 20, 2009 at 11:07:37AM +0100, Massa, Harald Armin wrote:
> Is it possible? How would the SQL utilizing WINDOW-functions look like?

there is no point in using window functions in here - simply use
"DISTINCT ON".

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: using window-functions to get freshest value - how?

From
Thomas Kellerer
Date:
Massa, Harald Armin, 20.11.2009 11:31:
> no, that does not work:
> "id_bf";"wert";"max"
> 98;"blue";"2009-11-10 00:00:00"
> 98;"red";"2009-11-10 00:00:00"
>
> result is: I get the date of the youngest value.
>
> My expected result is:
>
> 98;"red"
>
> (that is, the entry of "wert" that is youngest)
>
Sorry then I misunderstood your requirement

I don't think windowing functions are the best choice here.

I'd probably use something like:

SELECT r1.id_bf, r1.wert
FROM rfmitzeit r1
WHERE letztespeicherung =
      (SELECT max(letztespeicherung)
       FROM rfmitzeit r2
       WHERE r2.id_bf = r1.id_bf)

Not really different to your solution, but I think it's easier to read (personal taste!) but it also could be slightly
moreefficient. But most probably the optimizer is smart enough... 

Regards
Thomas

P.S.: could you please turn off the HTML format for your mails.
Especially the SQL statements are very hard to read...

Re: using window-functions to get freshest value - how?

From
"A. Kretschmer"
Date:
In response to hubert depesz lubaczewski :
> On Fri, Nov 20, 2009 at 11:07:37AM +0100, Massa, Harald Armin wrote:
> > Is it possible? How would the SQL utilizing WINDOW-functions look like?
>
> there is no point in using window functions in here - simply use
> "DISTINCT ON".

Right, but he want to extend his knowledge....

Harald, keep in mind, DISTINCT ON isn't portable.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

Re: using window-functions to get freshest value - how?

From
"Massa, Harald Armin"
Date:
> > Is it possible? How would the SQL utilizing WINDOW-functions look like?
>
> there is no point in using window functions in here - simply use
> "DISTINCT ON".

and how would I use DISTINCT ON for this query? Please bear in mind,
that there is more then one id_bf (just stopped the sample data with
one of them)

Best wishes,

Harald


--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality

Re: using window-functions to get freshest value - how?

From
Thom Brown
Date:
2009/11/20 Massa, Harald Armin <chef@ghum.de>
> > Is it possible? How would the SQL utilizing WINDOW-functions look like?
>
> there is no point in using window functions in here - simply use
> "DISTINCT ON".

and how would I use DISTINCT ON for this query? Please bear in mind,
that there is more then one id_bf (just stopped the sample data with
one of them)

Best wishes,

Harald


--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Would this work?

SELECT DISTINCT ON (id_bf) id_bf, wert, letztespeicherung
FROM rfmitzeit
ORDER BY id_bf, letztespeicherung ASC;

Regards

Thom

Re: using window-functions to get freshest value - how?

From
"A. Kretschmer"
Date:
In response to Massa, Harald Armin :
> > > Is it possible? How would the SQL utilizing WINDOW-functions look like?
> >
> > there is no point in using window functions in here - simply use
> > "DISTINCT ON".
>
> and how would I use DISTINCT ON for this query? Please bear in mind,
> that there is more then one id_bf (just stopped the sample data with
> one of them)

For my example:

test=# select * from harald order by 1,2;
 g |   datum    | value
---+------------+-------
 1 | 2009-11-01 |    10
 1 | 2009-11-02 |    11
 1 | 2009-11-03 |    12
 1 | 2009-11-05 |     1
 2 | 2009-11-04 |    20
 2 | 2009-11-10 |   100
(6 rows)


the solution with DISTINCT ON:

test=*# select distinct on (g) * from harald order by g, datum desc;
 g |   datum    | value
---+------------+-------
 1 | 2009-11-05 |     1
 2 | 2009-11-10 |   100
(2 rows)


Regards, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

Re: using window-functions to get freshest value - how?

From
silly8888
Date:
> and how would I use DISTINCT ON for this query? Please bear in mind,
> that there is more then one id_bf (just stopped the sample data with
> one of them)

I posted the answer more than hour ago:

SELECT DISTINCT ON (id_rf) * FROM rfmitzeit ORDER BY id_rf,
letztespeicherung DESC;

The equivalent with window functions would be:

SELECT * FROM (SELECT *,row_number() OVER (PARTITION BY id_rf ORDER BY
letztespeicherung DESC) FROM rfmitzeit) t WHERE row_number=1

If you check the query plan you will understand why DISTINCT ON is the
best option. Essensially, DISTINCT ON has no additional cost other the
 cost of

         ORDER BY id_rf, letztespeicherung DESC

which is unavoidable

Re: using window-functions to get freshest value - how?

From
Thom Brown
Date:
2009/11/20 silly8888 <silly8888@gmail.com>
> and how would I use DISTINCT ON for this query? Please bear in mind,
> that there is more then one id_bf (just stopped the sample data with
> one of them)

I posted the answer more than hour ago:

SELECT DISTINCT ON (id_rf) * FROM rfmitzeit ORDER BY id_rf,
letztespeicherung DESC;


Ah yes, I mixed up the id_rf and id_bf.  Yeah, this is what I would use (although I personally avoid using *).

Thom

Re: using window-functions to get freshest value - how?

From
"Massa, Harald Armin"
Date:
Thom, depesz, silly,

> SELECT DISTINCT ON (id_bf) id_bf, wert, letztespeicherung
> FROM rfmitzeit
> ORDER BY id_bf, letztespeicherung ASC;

yes, that does work. I put it in the real world query (my example was
reduced to the relevant parts), and it provides an impressive speedup
(down from 2234 to 1600ms on the worst possible dataset with worst
possible disk/cache situation)

And I learned that

DISTINCT ON (xxx) .... order by zzzz

will first do the order by, and then definitely take every "first row"
of the ordered result.

My expection of DISTINCT on (id_bf) was "give me ONLY ONE of the rows
with every id_bf", I was not aware of the fact that it will give me
THE FIRST of the 'ordered by' result set.

Thanks to all who helped me learn today!

Harald

(and sorry for leaving HTML on on some of the mails)

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality