Thread: Select most recent record?
Hi, I was hoping I could get some help with a select statement. I have a log table with three columns: id int4, timestamp datetime, value int4. For any given ID, there will be a large number of rows, with different timestamps and values. I'd like to select the newest (max(timestamp)) row for each id, before a given cutoff date; is this possible? The best I've been able to come up with is the rather ugly (and very slow): select * from log as l1 where timestamp in(select max(timestamp) from log where id=l1.id andtimestamp<'2001-01-01' groupby id); There must be a better way to do this; any tips? Thanks, - Marc
"Marc Sherman" <msherman@projectile.ca> writes: > I'd like to select the newest (max(timestamp)) row for each id, > before a given cutoff date; is this possible? select * from log order by timestamp desc limit 1; regards, tom lane
Except that he wants max(timestamp) by id; perhaps a GROUP BY would help, something like (untested): select max(timestamp) from log group by id; Tom Lane wrote: > > "Marc Sherman" <msherman@projectile.ca> writes: > > I'd like to select the newest (max(timestamp)) row for each id, > > before a given cutoff date; is this possible? > > select * from log order by timestamp desc limit 1; > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- -------------------------------------------------------------- Andrew J. Perrin - Programmer/Analyst, Desktop Support Children's Primary Care Research Group, UNC - Chapel Hill (919)966-9394 * andrew_perrin@unc.edu
<p><font size="2">-----BEGIN PGP SIGNED MESSAGE-----</font><br /><font size="2">Hash: SHA1</font><p><font size="2">Um, Ithink that the intention was to group by id, and have the max</font><br /><font size="2">timestamp per id. If that is thecase, then the original query is</font><br /><font size="2">probably about the best.</font><p><font size="2">Cheers...</font><br/><p><font size="2">MikeA</font><br /><br /><p><font size="2">>> -----Original Message-----</font><br/><font size="2">>> From: Tom Lane [<a href="mailto:tgl@sss.pgh.pa.us">mailto:tgl@sss.pgh.pa.us</a>]</font><br/><font size="2">>> Sent: 16 May 2001 15:36</font><br/><font size="2">>> To: Marc Sherman</font><br /><font size="2">>> Cc: pgsql-sql List</font><br/><font size="2">>> Subject: Re: [SQL] Select most recent record? </font><br /><font size="2">>></font><br /><font size="2">>> </font><br /><font size="2">>> "Marc Sherman" <msherman@projectile.ca>writes:</font><br /><font size="2">>> > I'd like to select the newest (max(timestamp))row for each id,</font><br /><font size="2">>> > before a given cutoff date; is this possible?</font><br/><font size="2">>> </font><br /><font size="2">>> select * from log order by timestamp desclimit 1;</font><br /><font size="2">>> </font><br /><font size="2">>> regards, tom lane</font><br/><font size="2">>> </font><br /><font size="2">>> ---------------------------(end of </font><br/><font size="2">>> broadcast)---------------------------</font><br /><font size="2">>> TIP 2: youcan get off all lists at once with the unregister</font><br /><font size="2">>> command </font><br /><font size="2">>> (send "unregister YourEmailAddressHere" to </font><br /><font size="2">>> majordomo@postgresql.org)</font><br/><font size="2">>> </font><p><font size="2">-----BEGIN PGP SIGNATURE-----</font><br/><font size="2">Version: PGPfreeware 6.5.3 for non-commercial use <<a href="http://www.pgp.com"target="_blank">http://www.pgp.com</a>></font><p><font size="2">iQA/AwUBOwKTsHympNV/C086EQLIAgCgo6Ypci5XK4iDsNSEzxf5o2XCnLcAoLUd</font><br/><font size="2">4qdproc00iw/9aAiVdwQBqm7</font><br/><font size="2">=cgTO</font><br /><font size="2">-----END PGP SIGNATURE-----</font><code><fontsize="3"><br /><br /> _________________________________________________________________________<br/> This e-mail and any attachments are confidentialand may also be privileged and/or copyright <br /> material of Intec Telecom Systems PLC (or its affiliated companies).If you are not an <br /> intended or authorised recipient of this e-mail or have received it in error, pleasedelete <br /> it immediately and notify the sender by e-mail. In such a case, reading, reproducing, <br /> printingor further dissemination of this e-mail is strictly prohibited and may be unlawful. <br /> Intec Telecom SystemsPLC. does not represent or warrant that an attachment hereto is free <br /> from computer viruses or other defects.The opinions expressed in this e-mail and any <br /> attachments may be those of the author and are not necessarilythose of Intec Telecom <br /> Systems PLC. <br /><br /> This footnote also confirms that this email message hasbeen swept by<br /> MIMEsweeper for the presence of computer viruses. <br /> __________________________________________________________________________<br/></font></code>
From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Marc Sherman" <msherman@projectile.ca> writes: > > I'd like to select the newest (max(timestamp)) row for each id, > > before a given cutoff date; is this possible? > > select * from log order by timestamp desc limit 1; Heh. I obviously simplified my situation too much. This is closer to what I've really got: create table user (userid int4 primary key, groupid int4); create table log (userid int4, timestamp datetime, value int4); I need to select sum(value) for each group, where the values chosen are the newest log entry for each group member that is before a cutoff date (April 1, 2001). Here's what I'm currently using: select user.groupid, sum(l1.value)from log as l1, userwhere user.userid=log.useridand log.timestamp in ( select max(timestamp)from log where log.timestamp<'2001-04-01' and log.userid=l1.userid)group by user.groupid; When I first posted, this was _very_ slow. I've since improved it by adding an index on log(userid,timestamp) - now it's just slow. If anyone knows how to make it faster, I'd appreciate it. - Marc
Andrew Perrin <andrew_perrin@unc.edu> writes: > Except that he wants max(timestamp) by id; perhaps a GROUP BY would > help, something like (untested): Oh, I'm sorry, I missed the "for each id" bit. > select max(timestamp) from log group by id; That gives you the max timestamp for each id, all right, but not the rest of the row in which the max timestamp occurs. I know of no good way to get that in standard SQL. If you don't mind being nonstandard, this is exactly the problem SELECT DISTINCT ON is meant for: select distinct on (id) * from log order by id, timestamp desc; See the SELECT reference page for more about this. regards, tom lane
Marc Sherman wrote: > ...... > > Heh. I obviously simplified my situation too much. > > This is closer to what I've really got: > > create table user (userid int4 primary key, groupid int4); > create table log (userid int4, timestamp datetime, value int4); > > I need to select sum(value) for each group, where the values chosen > are the newest log entry for each group member that is before a cutoff > date (April 1, 2001). > > Here's what I'm currently using: > > select user.groupid, sum(l1.value) > from log as l1, user > where user.userid=log.userid > and log.timestamp in ( > select max(timestamp) from log > where log.timestamp<'2001-04-01' > and log.userid=l1.userid) > group by user.groupid; > > When I first posted, this was _very_ slow. I've since improved > it by adding an index on log(userid,timestamp) - now it's just > slow. If anyone knows how to make it faster, I'd appreciate it. > > - Marc Try something like this: SELECT user.groupid, sum(l1.value) FROM log as l1, user WHERE user.userid = l1.userid and l1.timestamp = ( SELECT max(timestamp) from log WHERE log.timestamp < '2001-04-01' and log.userid = l1.userid ) GROUP by user.groupid; 1. you use in the same query both "log" and "l1" for the same table: "log as l1"; 2. you use log.timestamp in () ... but in this case you have ony one value ... use "=" instead "in". ================================================== George Moga, Data Systems Srl Slobozia, ROMANIA P.S. Sorry for my english ...
Marc, Did you ever get your problem solved to your satisfaction? We have a very simular problem with a historical database containing typically 5K id and updates to data every few seconds. We tried unsuccessfully to optimize queries such as those already suggested to you. We found the best means to quickly query the data valid at any given time was to: 1. Have a TIME and ENDTIME column in your table. The ENDTIME column held when the data became invalid. The initial valuefor the ENDTIME column was 'infinity'. 2. Have an INSERT rule that set the ENDTIME of all previous records with same ID to TIME. Here is the rule: CREATE RULE d_people_ON_INSERT AS ON INSERT TO d_people DO UPDATE d_people SET endtime= new.time WHERE ( id = new.id ) AND ( endtime = 'infinity' ) ; 3. Selects for any given time are then like the ones below and very fast. /* For time '2000-11-20 15:56' */ SELECT * FROM d_people WHERE ( time <= '2000-11-20 15:56' ) AND( endtime > '2000-11-20 15:56' ); /* For latest time */ SELECT * FROM d_people WHERE ( time <= now()) AND ( endtime > now()); Granted, INSERTs take a little longer since they trigger an UPDATE. But optimized indices help greatly with this. I highly recommend the following book on the problems and solutions of temporal data in databases written by the man who is defining the temporal functionalities of SQL3. Richard Snodgrass, "Developing Time-Oriented Database Applications in SQL" http://www.amazon.com/exec/obidos/ASIN/1558604367/qid=993149249/sr=1-4/ref=s c_b_4/103-3746626-6461410 I hope this helps. It may be overkill, depending on the type and quantity of your data. Thanks, Mark Hamby > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Marc Sherman > Sent: Wednesday, May 16, 2001 6:28 AM > To: pgsql-sql List > Subject: [SQL] Select most recent record? > > > Hi, I was hoping I could get some help with a select statement. > > I have a log table with three columns: id int4, timestamp datetime, > value int4. > > For any given ID, there will be a large number of rows, with > different timestamps and values. > > I'd like to select the newest (max(timestamp)) row for each id, > before a given cutoff date; is this possible? > > The best I've been able to come up with is the rather ugly (and > very slow): > > select * from log as l1 where timestamp in > (select max(timestamp) from log where id=l1.id and > timestamp<'2001-01-01' group by id); > > There must be a better way to do this; any tips? > > Thanks, > - Marc > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
From: Mark Hamby [mailto:mark@archer.leavenworth.army.mil] > > Marc, > Did you ever get your problem solved to your satisfaction? > We have a very simular problem with a historical database > containing typically 5K id and updates to data every few > seconds. We tried unsuccessfully to optimize queries > such as those already suggested to you. We found the best > means to quickly query the data valid at any given time was > to: [snip] By adding an index, I was able to get the performance of the query within a reasonable threshold; it's still slow, but acceptable. I'll bear your suggestion in mind for next time I run into this kind of problem, though, if my sol'n can't handle the data. Thanks, - Marc