getting most recent row efficiently - Mailing list pgsql-general

From Fran Fabrizio
Subject getting most recent row efficiently
Date
Msg-id 3C1E63EA.D46B8B58@mmrd.com
Whole thread Raw
Responses Practical PostgreSQL moves East!
Re: getting most recent row efficiently
Re: getting most recent row efficiently
List pgsql-general
Hello,

I've got a table that keeps a log of a person's favorites over time
(what follows is a simplified example)

person_id    favorite_color
1                red
1                blue
2                green
3                yellow
3                purple

I want the set of most recent entries for each person.  So assuming the
order they appear above is the order they were inserted, I want:

person_id    favorite_color
1                blue
2                green
3                purple

Is there any way to retrieve "the most recent row for each distinct
person_id?" without making a timestamp column for when the row was
inserted and then grouping them by person_id?

What I include below is the real world example.  There's a table,
doctor_favorites, that functions like my sample table above.  There's a
view, current_doctor_favorites, which contains the most recent row for
each doctor.  This view has a high cost as you can see.  There's a
second view, doctor_metrics, which combines the data from the
current_doctor_favorites view with that of another table, and that is my
ultimate goal.  It's taking a loooong time to run, I think because of
the sequential scan when creating the current_doctor_favorites view from
the doctor_favorites table.  I'm wondering if there's a way to perhaps
use an OID or something to determine the most recent rows from
doctor_favorites rather than the timestamp and the subselect.  I tried
to create an index on the local_ts timestamp field of the underlying
doctor_favorites table, but it's hard to index something that's
basically unique each time you insert a row anyhow.

monitor-prod=# \d doctor_favorites
                      Table "doctor_favorites"
    Attribute    |           Type           |        Modifier
-----------------+--------------------------+------------------------
 favorites       | integer                  |
 remote_ts       | timestamp with time zone | not null
 local_ts        | timestamp with time zone | not null default now()
 med_practice_id | integer                  |
 doctor_id       | integer                  |
Indices: docid_index,
         docid_medpracid_index,
         localts_index,
         medpracid_index

monitor-test=# \d current_doctor_favorites
   View "current_doctor_favorites"
    Attribute    |  Type   | Modifier
-----------------+---------+----------
 doctor_id       | integer |
 med_practice_id | integer |
 favorites       | integer |
View definition: SELECT df.doctor_id, df.med_practice_id, df.favorites
FROM doctor_favorites df WHERE (df.local_ts = (SELECT
max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE
((doctor_favorites.doctor_id = df.doctor_id) AND
(doctor_favorites.med_practice_id = df.med_practice_id))));


monitor-test=# explain  SELECT df.doctor_id, df.med_practice_id,
df.favorites FROM doctor_favorites df WHERE (df.local_ts = (SELECT
max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE
((doctor_favorites.doctor_id = df.doctor_id) AND
(doctor_favorites.med_practice_id = df.med_practice_id))));
NOTICE:  QUERY PLAN:

Seq Scan on doctor_favorites df  (cost=0.00..9564.75 rows=47 width=12)
  SubPlan
    ->  Aggregate  (cost=2.02..2.02 rows=1 width=8)
          ->  Index Scan using docid_medpracid_index on
doctor_favorites  (cost=0.00..2.02 rows=1 width=8)

EXPLAIN
monitor-test=#

monitor-prod=# \d doctor_metrics
        View "doctor_metrics"
    Attribute    |  Type   | Modifier
-----------------+---------+----------
 doctor_id       | integer |
 med_practice_id | integer |
 lab_requests    | integer |
 lab_results     | integer |
 rx_auth         | integer |
 transcriptions  | integer |
 omnidocs        | integer |
 phone           | integer |
 favorites       | integer |
View definition: SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL) THEN
doctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS
doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN
doctor_tasks.med_practice_id ELSE
current_doctor_favorites.med_practice_id END AS med_practice_id,
doctor_tasks.lab_requests, doctor_tasks.lab_results,
doctor_tasks.rx_auth, doctor_tasks.transcriptions,
doctor_tasks.omnidocs, doctor_tasks.phone,
current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN
current_doctor_favorites USING (doctor_id, med_practice_id));

monitor-prod=# explain SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL)
THEN do
ctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS
doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN
doctor_tasks.med_practice_id ELSE
current_doctor_favorites.med_practice_id END AS med_practice_id,
doctor_tasks.lab_requests, doctor_tasks.lab_results,
doctor_tasks.rx_auth, doctor_tasks.transcriptions,
doctor_tasks.omnidocs, doctor_tasks.phone,
current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN
current_doctor_favorites USING (doctor_id, med_practice_id));
NOTICE:  QUERY PLAN:

Merge Join  (cost=9566.05..9640.75 rows=1000 width=44)
  ->  Index Scan using doctor_tasks_pkey on doctor_tasks
(cost=0.00..59.00 rows=1000 width=32)
  ->  Sort  (cost=9566.05..9566.05 rows=47 width=12)
        ->  Subquery Scan current_doctor_favorites  (cost=0.00..9564.75
rows=47 width=12)
              ->  Seq Scan on doctor_favorites df  (cost=0.00..9564.75
rows=47 width=12)
                    SubPlan
                      ->  Aggregate  (cost=2.02..2.02 rows=1 width=8)
                            ->  Index Scan using docid_medpracid_index
on doctor_favorites  (cost=0.00..2.02 rows=1 width=8)

EXPLAIN
monitor-prod=#

Thanks for your feedback,
Fran


pgsql-general by date:

Previous
From: Darren Ferguson
Date:
Subject: Re: Query Help
Next
From: "Command Prompt, Inc."
Date:
Subject: Practical PostgreSQL moves East!