getting most recent row efficiently - Mailing list pgsql-general
From | Fran Fabrizio |
---|---|
Subject | getting most recent row efficiently |
Date | |
Msg-id | 3C1E63EA.D46B8B58@mmrd.com Whole thread Raw |
Responses |
Practical PostgreSQL moves East!
Re: getting most recent row efficiently Re: getting most recent row efficiently |
List | pgsql-general |
Hello, I've got a table that keeps a log of a person's favorites over time (what follows is a simplified example) person_id favorite_color 1 red 1 blue 2 green 3 yellow 3 purple I want the set of most recent entries for each person. So assuming the order they appear above is the order they were inserted, I want: person_id favorite_color 1 blue 2 green 3 purple Is there any way to retrieve "the most recent row for each distinct person_id?" without making a timestamp column for when the row was inserted and then grouping them by person_id? What I include below is the real world example. There's a table, doctor_favorites, that functions like my sample table above. There's a view, current_doctor_favorites, which contains the most recent row for each doctor. This view has a high cost as you can see. There's a second view, doctor_metrics, which combines the data from the current_doctor_favorites view with that of another table, and that is my ultimate goal. It's taking a loooong time to run, I think because of the sequential scan when creating the current_doctor_favorites view from the doctor_favorites table. I'm wondering if there's a way to perhaps use an OID or something to determine the most recent rows from doctor_favorites rather than the timestamp and the subselect. I tried to create an index on the local_ts timestamp field of the underlying doctor_favorites table, but it's hard to index something that's basically unique each time you insert a row anyhow. monitor-prod=# \d doctor_favorites Table "doctor_favorites" Attribute | Type | Modifier -----------------+--------------------------+------------------------ favorites | integer | remote_ts | timestamp with time zone | not null local_ts | timestamp with time zone | not null default now() med_practice_id | integer | doctor_id | integer | Indices: docid_index, docid_medpracid_index, localts_index, medpracid_index monitor-test=# \d current_doctor_favorites View "current_doctor_favorites" Attribute | Type | Modifier -----------------+---------+---------- doctor_id | integer | med_practice_id | integer | favorites | integer | View definition: SELECT df.doctor_id, df.med_practice_id, df.favorites FROM doctor_favorites df WHERE (df.local_ts = (SELECT max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE ((doctor_favorites.doctor_id = df.doctor_id) AND (doctor_favorites.med_practice_id = df.med_practice_id)))); monitor-test=# explain SELECT df.doctor_id, df.med_practice_id, df.favorites FROM doctor_favorites df WHERE (df.local_ts = (SELECT max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE ((doctor_favorites.doctor_id = df.doctor_id) AND (doctor_favorites.med_practice_id = df.med_practice_id)))); NOTICE: QUERY PLAN: Seq Scan on doctor_favorites df (cost=0.00..9564.75 rows=47 width=12) SubPlan -> Aggregate (cost=2.02..2.02 rows=1 width=8) -> Index Scan using docid_medpracid_index on doctor_favorites (cost=0.00..2.02 rows=1 width=8) EXPLAIN monitor-test=# monitor-prod=# \d doctor_metrics View "doctor_metrics" Attribute | Type | Modifier -----------------+---------+---------- doctor_id | integer | med_practice_id | integer | lab_requests | integer | lab_results | integer | rx_auth | integer | transcriptions | integer | omnidocs | integer | phone | integer | favorites | integer | View definition: SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL) THEN doctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN doctor_tasks.med_practice_id ELSE current_doctor_favorites.med_practice_id END AS med_practice_id, doctor_tasks.lab_requests, doctor_tasks.lab_results, doctor_tasks.rx_auth, doctor_tasks.transcriptions, doctor_tasks.omnidocs, doctor_tasks.phone, current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN current_doctor_favorites USING (doctor_id, med_practice_id)); monitor-prod=# explain SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL) THEN do ctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN doctor_tasks.med_practice_id ELSE current_doctor_favorites.med_practice_id END AS med_practice_id, doctor_tasks.lab_requests, doctor_tasks.lab_results, doctor_tasks.rx_auth, doctor_tasks.transcriptions, doctor_tasks.omnidocs, doctor_tasks.phone, current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN current_doctor_favorites USING (doctor_id, med_practice_id)); NOTICE: QUERY PLAN: Merge Join (cost=9566.05..9640.75 rows=1000 width=44) -> Index Scan using doctor_tasks_pkey on doctor_tasks (cost=0.00..59.00 rows=1000 width=32) -> Sort (cost=9566.05..9566.05 rows=47 width=12) -> Subquery Scan current_doctor_favorites (cost=0.00..9564.75 rows=47 width=12) -> Seq Scan on doctor_favorites df (cost=0.00..9564.75 rows=47 width=12) SubPlan -> Aggregate (cost=2.02..2.02 rows=1 width=8) -> Index Scan using docid_medpracid_index on doctor_favorites (cost=0.00..2.02 rows=1 width=8) EXPLAIN monitor-prod=# Thanks for your feedback, Fran
pgsql-general by date: