Re: getting most recent row efficiently - Mailing list pgsql-general
From | SHELTON,MICHAEL (Non-HP-Boise,ex1) |
---|---|
Subject | Re: getting most recent row efficiently |
Date | |
Msg-id | FB60DFB2C0E24449AC0C21F743B935410150F840@xboi02.boi.hp.com Whole thread Raw |
In response to | getting most recent row efficiently (Fran Fabrizio <ffabrizio@mmrd.com>) |
List | pgsql-general |
Here are a few suggestions off the top of my head: 1) It sounds like performance is the issue here. My first question would be why you store a "list" of favorites for each doc. If you "only" need the latest, just do an update instead of an insert (unless no doc entry...), then your table is short and to the point. If you need the other data, then obviously this isn't the solution. 2) If you need all favorites in order, what about a 2nd table (space vs performance tradeoff) that stores only the most recent favorite. Longer store time when inserting a favorite, quick lookup. 3) In the case of trying to store the order without a timestamp you can't do it simply by inserting rows (there is no guarantee for row order, it will insert where it wants -- logical, but not always in order). However, you could have a table ID (not the doc id) used only to indicate row #. It would be unique, auto-incr, indexed, etc. This way you could sort by doc, and take the top one by table ID and get your favorite. I'm sure there are other ways to do this, but thought these might help. Mike Shelton -----Original Message----- From: Fran Fabrizio [mailto:ffabrizio@mmrd.com] Sent: Monday, December 17, 2001 2:30 PM To: pgsql-general@postgresql.org Subject: [GENERAL] getting most recent row efficiently Hello, I've got a table that keeps a log of a person's favorites over time (what follows is a simplified example) person_id favorite_color 1 red 1 blue 2 green 3 yellow 3 purple I want the set of most recent entries for each person. So assuming the order they appear above is the order they were inserted, I want: person_id favorite_color 1 blue 2 green 3 purple Is there any way to retrieve "the most recent row for each distinct person_id?" without making a timestamp column for when the row was inserted and then grouping them by person_id? What I include below is the real world example. There's a table, doctor_favorites, that functions like my sample table above. There's a view, current_doctor_favorites, which contains the most recent row for each doctor. This view has a high cost as you can see. There's a second view, doctor_metrics, which combines the data from the current_doctor_favorites view with that of another table, and that is my ultimate goal. It's taking a loooong time to run, I think because of the sequential scan when creating the current_doctor_favorites view from the doctor_favorites table. I'm wondering if there's a way to perhaps use an OID or something to determine the most recent rows from doctor_favorites rather than the timestamp and the subselect. I tried to create an index on the local_ts timestamp field of the underlying doctor_favorites table, but it's hard to index something that's basically unique each time you insert a row anyhow. monitor-prod=# \d doctor_favorites Table "doctor_favorites" Attribute | Type | Modifier -----------------+--------------------------+------------------------ favorites | integer | remote_ts | timestamp with time zone | not null local_ts | timestamp with time zone | not null default now() med_practice_id | integer | doctor_id | integer | Indices: docid_index, docid_medpracid_index, localts_index, medpracid_index monitor-test=# \d current_doctor_favorites View "current_doctor_favorites" Attribute | Type | Modifier -----------------+---------+---------- doctor_id | integer | med_practice_id | integer | favorites | integer | View definition: SELECT df.doctor_id, df.med_practice_id, df.favorites FROM doctor_favorites df WHERE (df.local_ts = (SELECT max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE ((doctor_favorites.doctor_id = df.doctor_id) AND (doctor_favorites.med_practice_id = df.med_practice_id)))); monitor-test=# explain SELECT df.doctor_id, df.med_practice_id, df.favorites FROM doctor_favorites df WHERE (df.local_ts = (SELECT max(doctor_favorites.local_ts) AS max FROM doctor_favorites WHERE ((doctor_favorites.doctor_id = df.doctor_id) AND (doctor_favorites.med_practice_id = df.med_practice_id)))); NOTICE: QUERY PLAN: Seq Scan on doctor_favorites df (cost=0.00..9564.75 rows=47 width=12) SubPlan -> Aggregate (cost=2.02..2.02 rows=1 width=8) -> Index Scan using docid_medpracid_index on doctor_favorites (cost=0.00..2.02 rows=1 width=8) EXPLAIN monitor-test=# monitor-prod=# \d doctor_metrics View "doctor_metrics" Attribute | Type | Modifier -----------------+---------+---------- doctor_id | integer | med_practice_id | integer | lab_requests | integer | lab_results | integer | rx_auth | integer | transcriptions | integer | omnidocs | integer | phone | integer | favorites | integer | View definition: SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL) THEN doctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN doctor_tasks.med_practice_id ELSE current_doctor_favorites.med_practice_id END AS med_practice_id, doctor_tasks.lab_requests, doctor_tasks.lab_results, doctor_tasks.rx_auth, doctor_tasks.transcriptions, doctor_tasks.omnidocs, doctor_tasks.phone, current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN current_doctor_favorites USING (doctor_id, med_practice_id)); monitor-prod=# explain SELECT CASE WHEN (doctor_tasks.doctor_id NOTNULL) THEN do ctor_tasks.doctor_id ELSE current_doctor_favorites.doctor_id END AS doctor_id, CASE WHEN (doctor_tasks.med_practice_id NOTNULL) THEN doctor_tasks.med_practice_id ELSE current_doctor_favorites.med_practice_id END AS med_practice_id, doctor_tasks.lab_requests, doctor_tasks.lab_results, doctor_tasks.rx_auth, doctor_tasks.transcriptions, doctor_tasks.omnidocs, doctor_tasks.phone, current_doctor_favorites.favorites FROM (doctor_tasks FULL JOIN current_doctor_favorites USING (doctor_id, med_practice_id)); NOTICE: QUERY PLAN: Merge Join (cost=9566.05..9640.75 rows=1000 width=44) -> Index Scan using doctor_tasks_pkey on doctor_tasks (cost=0.00..59.00 rows=1000 width=32) -> Sort (cost=9566.05..9566.05 rows=47 width=12) -> Subquery Scan current_doctor_favorites (cost=0.00..9564.75 rows=47 width=12) -> Seq Scan on doctor_favorites df (cost=0.00..9564.75 rows=47 width=12) SubPlan -> Aggregate (cost=2.02..2.02 rows=1 width=8) -> Index Scan using docid_medpracid_index on doctor_favorites (cost=0.00..2.02 rows=1 width=8) EXPLAIN monitor-prod=# Thanks for your feedback, Fran ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
pgsql-general by date: