Re: Picking out the most recent row using a time stamp column - Mailing list pgsql-performance

From Dave Crooke
Subject Re: Picking out the most recent row using a time stamp column
Date
Msg-id AANLkTi=N0LJp43Zvsyfj=Kd8f0wbF+4-ex8jy1zddWzP@mail.gmail.com
Whole thread Raw
In response to Re: Picking out the most recent row using a time stamp column  (Dave Johansen <davejohansen@gmail.com>)
Responses Re: Picking out the most recent row using a time stamp column
Re: Picking out the most recent row using a time stamp column
List pgsql-performance
Hi Dave

Yes, 100% the best solution .... I did the same thing a while back, I just have a separate copy of the data in a "latest" table and the Java code just runs a second SQL statement to update it when writing a new record (I've never been a trigger fan).

I found myself looking at the "find the latest" query again though in the process of building a "demo mode" into our application, which will replay a finite set of data on a rolling loop by moving it forward in time, and also has to simulate the continuous updating of the "latest" table so the the business logic will be appropriately fooled.

My next tweak will be to cache the "latest" table in the Java layer ;-)

Cheers
Dave

On Fri, Feb 25, 2011 at 10:50 AM, Dave Johansen <davejohansen@gmail.com> wrote:
On Thu, Feb 24, 2011 at 4:38 PM, Dave Crooke <dcrooke@gmail.com> wrote:
Thanks to all .... I had a tickling feeling at the back of my mind that there was a neater answer here. For the record, times (all from in-memory cached data, averaged over a bunch of runs):

Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec

So the DISTINCT ON may not be quicker, but it sure is tidier.

Cheers
Dave

I'm using 8.3.3 and I have a similar sort of setup and just thought I'd add another point of reference, here's the timing from doing the same sort of queries on my dataset of ~700,000 records with ~10,000 unique "id_key"s.

I also added a 4th version that uses a permanent table that's auto-populated by a trigger with the rid of the most recent entry from the main table, so it's a simple join to get the latest entries.

Dependent subquery = (killed it after it ran for over 10 minutes)
Join on temp table = 1.5 seconds
DISTINCT ON = 2.9 seconds
Join on auto-populated table = 0.8 seconds

Dave

pgsql-performance by date:

Previous
From: Dave Johansen
Date:
Subject: Re: Picking out the most recent row using a time stamp column
Next
From: Bhakti Ghatkar
Date:
Subject: Vacuum problem due to temp tables