Thread: Select most recent record?

Select most recent record?

From

"Marc Sherman"

Date:

16 May 2001, 12:28:13

Hi, I was hoping I could get some help with a select statement.

I have a log table with three columns: id int4, timestamp datetime,
value int4.

For any given ID, there will be a large number of rows, with
different timestamps and values.

I'd like to select the newest (max(timestamp)) row for each id,
before a given cutoff date; is this possible?

The best I've been able to come up with is the rather ugly (and
very slow):

select * from log as l1 where timestamp in(select max(timestamp) from log where id=l1.id andtimestamp<'2001-01-01'
groupby id);
 

There must be a better way to do this; any tips?

Thanks,
- Marc

Re: Select most recent record?

From

Tom Lane

Date:

16 May 2001, 13:36:12

"Marc Sherman" <msherman@projectile.ca> writes:
> I'd like to select the newest (max(timestamp)) row for each id,
> before a given cutoff date; is this possible?

select * from log order by timestamp desc limit 1;
        regards, tom lane

Re: Select most recent record?

From

Andrew Perrin

Date:

16 May 2001, 13:44:23

Except that he wants max(timestamp) by id; perhaps a GROUP BY would
help, something like (untested):

select max(timestamp) from log group by id;

Tom Lane wrote:
> 
> "Marc Sherman" <msherman@projectile.ca> writes:
> > I'd like to select the newest (max(timestamp)) row for each id,
> > before a given cutoff date; is this possible?
> 
> select * from log order by timestamp desc limit 1;
> 
>                         regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
--------------------------------------------------------------
Andrew J. Perrin - Programmer/Analyst, Desktop Support
Children's Primary Care Research Group, UNC - Chapel Hill
(919)966-9394 * andrew_perrin@unc.edu

RE: Select most recent record?

From

"Michael Ansley (UK)"

Date:

16 May 2001, 13:47:01

<p><font size="2">-----BEGIN PGP SIGNED MESSAGE-----</font><br /><font size="2">Hash: SHA1</font><p><font size="2">Um,
Ithink that the intention was to group by id, and have the max</font><br /><font size="2">timestamp per id.  If that is
thecase, then the original query is</font><br /><font size="2">probably about the best.</font><p><font
size="2">Cheers...</font><br/><p><font size="2">MikeA</font><br /><br /><p><font size="2">>> -----Original
Message-----</font><br/><font size="2">>> From: Tom Lane [<a
href="mailto:tgl@sss.pgh.pa.us">mailto:tgl@sss.pgh.pa.us</a>]</font><br/><font size="2">>> Sent: 16 May 2001
15:36</font><br/><font size="2">>> To: Marc Sherman</font><br /><font size="2">>> Cc: pgsql-sql
List</font><br/><font size="2">>> Subject: Re: [SQL] Select most recent record? </font><br /><font
size="2">>></font><br /><font size="2">>> </font><br /><font size="2">>> "Marc Sherman"
<msherman@projectile.ca>writes:</font><br /><font size="2">>> > I'd like to select the newest
(max(timestamp))row for each id,</font><br /><font size="2">>> > before a given cutoff date; is this
possible?</font><br/><font size="2">>> </font><br /><font size="2">>> select * from log order by timestamp
desclimit 1;</font><br /><font size="2">>> </font><br /><font size="2">>>                      regards, tom
lane</font><br/><font size="2">>> </font><br /><font size="2">>> ---------------------------(end of
</font><br/><font size="2">>> broadcast)---------------------------</font><br /><font size="2">>> TIP 2:
youcan get off all lists at once with the unregister</font><br /><font size="2">>> command </font><br /><font
size="2">>>    (send "unregister YourEmailAddressHere" to </font><br /><font size="2">>>
majordomo@postgresql.org)</font><br/><font size="2">>> </font><p><font size="2">-----BEGIN PGP
SIGNATURE-----</font><br/><font size="2">Version: PGPfreeware 6.5.3 for non-commercial use <<a
href="http://www.pgp.com"target="_blank">http://www.pgp.com</a>></font><p><font
size="2">iQA/AwUBOwKTsHympNV/C086EQLIAgCgo6Ypci5XK4iDsNSEzxf5o2XCnLcAoLUd</font><br/><font
size="2">4qdproc00iw/9aAiVdwQBqm7</font><br/><font size="2">=cgTO</font><br /><font size="2">-----END PGP
SIGNATURE-----</font><code><fontsize="3"><br /><br />
_________________________________________________________________________<br/> This e-mail and any attachments are
confidentialand may also be privileged and/or copyright <br /> material of Intec Telecom Systems PLC (or its affiliated
companies).If you are not an <br /> intended or authorised recipient of this e-mail or have received it in error,
pleasedelete <br /> it immediately and notify the sender by e-mail. In such a case, reading, reproducing, <br />
printingor further dissemination of this e-mail is strictly prohibited and may be unlawful. <br /> Intec Telecom
SystemsPLC. does not represent or warrant that an attachment hereto is free <br /> from computer viruses or other
defects.The opinions expressed in this e-mail and any <br /> attachments may be those of the author and are not
necessarilythose of Intec Telecom <br /> Systems PLC. <br /><br /> This footnote also confirms that this email message
hasbeen swept by<br /> MIMEsweeper for the presence of computer viruses. <br />
__________________________________________________________________________<br/></font></code>

RE: Select most recent record?

From

"Marc Sherman"

Date:

16 May 2001, 14:12:56

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> "Marc Sherman" <msherman@projectile.ca> writes:
> > I'd like to select the newest (max(timestamp)) row for each id,
> > before a given cutoff date; is this possible?
> 
> select * from log order by timestamp desc limit 1;

Heh.  I obviously simplified my situation too much.

This is closer to what I've really got:

create table user (userid int4 primary key, groupid int4);
create table log (userid int4, timestamp datetime, value int4);

I need to select sum(value) for each group, where the values chosen
are the newest log entry for each group member that is before a cutoff 
date (April 1, 2001).

Here's what I'm currently using:

select user.groupid, sum(l1.value)from log as l1, userwhere user.userid=log.useridand log.timestamp in (    select
max(timestamp)from log    where log.timestamp<'2001-04-01'    and log.userid=l1.userid)group by user.groupid;

When I first posted, this was _very_ slow.  I've since improved
it by adding an index on log(userid,timestamp) - now it's just
slow.  If anyone knows how to make it faster, I'd appreciate it.

- Marc

Re: Select most recent record?

From

Tom Lane

Date:

16 May 2001, 14:19:16

Andrew Perrin <andrew_perrin@unc.edu> writes:
> Except that he wants max(timestamp) by id; perhaps a GROUP BY would
> help, something like (untested):

Oh, I'm sorry, I missed the "for each id" bit.

> select max(timestamp) from log group by id;

That gives you the max timestamp for each id, all right, but not the
rest of the row in which the max timestamp occurs.  I know of no good
way to get that in standard SQL.  If you don't mind being nonstandard,
this is exactly the problem SELECT DISTINCT ON is meant for:

select distinct on (id) * from log order by id, timestamp desc;

See the SELECT reference page for more about this.
        regards, tom lane

Re: Select most recent record?

From

George Moga

Date:

16 May 2001, 19:39:35

Marc Sherman wrote:

> ......

>
> Heh.  I obviously simplified my situation too much.
>
> This is closer to what I've really got:
>
> create table user (userid int4 primary key, groupid int4);
> create table log (userid int4, timestamp datetime, value int4);
>
> I need to select sum(value) for each group, where the values chosen
> are the newest log entry for each group member that is before a cutoff
> date (April 1, 2001).
>
> Here's what I'm currently using:
>
> select user.groupid, sum(l1.value)
>         from log as l1, user
>         where user.userid=log.userid
>         and log.timestamp in (
>                 select max(timestamp) from log
>                 where log.timestamp<'2001-04-01'
>                 and log.userid=l1.userid)
>         group by user.groupid;
>
> When I first posted, this was _very_ slow.  I've since improved
> it by adding an index on log(userid,timestamp) - now it's just
> slow.  If anyone knows how to make it faster, I'd appreciate it.
>

> - Marc

Try something like this:

SELECT   user.groupid, sum(l1.value)
FROM   log as l1, user
WHERE   user.userid = l1.userid and   l1.timestamp = (       SELECT           max(timestamp) from log       WHERE
   log.timestamp < '2001-04-01' and           log.userid = l1.userid   )
 
GROUP by user.groupid;

1.  you use in the same query both "log" and "l1" for the same table: "log as
l1";
2.  you use log.timestamp in () ... but in this case you have ony one value ...
use "=" instead "in".

==================================================
George Moga,   Data Systems Srl   Slobozia, ROMANIA


P.S.  Sorry for my english ...

RE: Select most recent record?

From

"Mark Hamby"

Date:

21 June 2001, 17:51:51

Marc,
Did you ever get your problem solved to your satisfaction?
We have a very simular problem with a historical database
containing typically 5K id and updates to data every few
seconds.  We tried unsuccessfully to optimize queries
such as those already suggested to you.  We found the best
means to quickly query the data valid at any given time was
to:
1. Have a TIME and ENDTIME column in your table.   The ENDTIME column held when the data became invalid.   The initial
valuefor the ENDTIME column was 'infinity'.
 
2. Have an INSERT rule that set the ENDTIME of all previous   records with same ID to TIME.  Here is the rule:
    CREATE RULE d_people_ON_INSERT AS        ON INSERT        TO d_people        DO UPDATE d_people            SET
endtime= new.time            WHERE    ( id = new.id )                AND ( endtime = 'infinity' )        ;
 
3. Selects for any given time are then like the ones below   and very fast.
    /* For time '2000-11-20 15:56' */    SELECT * FROM d_people        WHERE ( time <= '2000-11-20 15:56' )
AND( endtime > '2000-11-20 15:56' );
 
    /* For latest time */    SELECT * FROM d_people        WHERE ( time <= now())            AND ( endtime > now());


Granted, INSERTs take a little longer since they trigger an UPDATE.
But optimized indices help greatly with this.

I highly recommend the following book on the problems and
solutions of temporal data in databases written by the man
who is defining the temporal functionalities of SQL3.
Richard Snodgrass, "Developing Time-Oriented Database Applications in SQL"
http://www.amazon.com/exec/obidos/ASIN/1558604367/qid=993149249/sr=1-4/ref=s
c_b_4/103-3746626-6461410

I hope this helps.  It may be overkill, depending on the type and
quantity of your data.

Thanks,
Mark Hamby


> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Marc Sherman
> Sent: Wednesday, May 16, 2001 6:28 AM
> To: pgsql-sql List
> Subject: [SQL] Select most recent record?
>
>
> Hi, I was hoping I could get some help with a select statement.
>
> I have a log table with three columns: id int4, timestamp datetime,
> value int4.
>
> For any given ID, there will be a large number of rows, with
> different timestamps and values.
>
> I'd like to select the newest (max(timestamp)) row for each id,
> before a given cutoff date; is this possible?
>
> The best I've been able to come up with is the rather ugly (and
> very slow):
>
> select * from log as l1 where timestamp in
>     (select max(timestamp) from log where id=l1.id and
>     timestamp<'2001-01-01' group by id);
>
> There must be a better way to do this; any tips?
>
> Thanks,
> - Marc
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

RE: Select most recent record?

From

"Marc Sherman"

Date:

21 June 2001, 18:31:44

From: Mark Hamby [mailto:mark@archer.leavenworth.army.mil]
> 
> Marc,
> Did you ever get your problem solved to your satisfaction?
> We have a very simular problem with a historical database
> containing typically 5K id and updates to data every few
> seconds.  We tried unsuccessfully to optimize queries
> such as those already suggested to you.  We found the best
> means to quickly query the data valid at any given time was
> to:
[snip]

By adding an index, I was able to get the performance of the
query within a reasonable threshold; it's still slow, but
acceptable.  I'll bear your suggestion in mind for next time
I run into this kind of problem, though, if my sol'n can't
handle the data.  Thanks,

- Marc