Home > mailing lists

Picking out the most recent row using a time stamp column - Mailing list pgsql-performance

From	Dave Crooke
Subject	Picking out the most recent row using a time stamp column
Date	February 24, 2011 15:55:42
Msg-id	AANLkTinTgU-jfPOhrfteBvhAhgG_8zknpk13N9L4QMap@mail.gmail.com Whole thread Raw
Responses	Re: Picking out the most recent row using a time stamp column Re: Picking out the most recent row using a time stamp column Re: Picking out the most recent row using a time stamp column
List	pgsql-performance

Tree view

Hi foks

This is an old chestnut which I've found a number of online threads for, and never seen a clever answer to. It seems a common enough idiom that there might be some slicker way to do it, so I thought I might inquire with this august group if such a clever answer exists ....

Consider the following table

create table data
   (id_key int,
    time_stamp timestamp without time zone,
    value double precision);

create unique index data_idx on data (id_key, time_stamp);

with around 1m rows, with 3500 or so distinct values of id_key.

I need to find the most recent value for each distinct value of id_key. There is no elegant (that I know of) syntax for this, and there are two ways I've typically seen it done:

1. Use a dependent subquery to find the most recent time stamp, i.e.

select
   a.id_key, a.time_stamp, a.value
from
   data a
where
a.time_stamp=
     (select max(time_stamp)
      from data b
      where a.id_key=b.id_key)

2. Define a temporary table / view with the most recent time stamp for each key, and join against it:

select
   a.id_key, a.time_stamp, a.value
from
   data a,
   (select id_key, max(time_stamp) as mts
    from data group by id_key) b
where
   a.id_key=b.id_key and a.time_stamp=b.mts

I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as written" plan in each case, and that method 2. is much quicker (2.6 sec vs. 2 min on my laptop) ....

Is there a more elegant way to write this, perhaps using PG-specific extensions?

Cheers
Dave

pgsql-performance by date:

From: Shaun Thomas
Date: 24 February 2011, 15:36:42
Subject: Re: Unused indices

From: Dave Johansen
Date: 24 February 2011, 15:56:54
Subject: Re: Pushing IN (subquery) down through UNION ALL?

Picking out the most recent row using a time stamp column - Mailing list pgsql-performance

Previous

Next