Re: Help with rewriting query - Mailing list pgsql-performance
From | Junaili Lie |
---|---|
Subject | Re: Help with rewriting query |
Date | |
Msg-id | 8d04ce9905060918306fc4afb8@mail.gmail.com Whole thread Raw |
In response to | Re: Help with rewriting query ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
List | pgsql-performance |
Hi Kevin, Thanks for the reply. I tried that query. It definately faster, but not fast enough (took around 50 second to complete). I have around 2.5 million on food and 1000 on person. Here is the query plan: QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..11662257.52 rows=1441579 width=16) Merge Cond: ("outer".id = "inner".p_id) -> Index Scan using person_pkey on person p (cost=0.00..25.17 rows=569 width=8) -> Index Scan using p_id_food_index on food f (cost=0.00..11644211.28 rows=1441579 width=16) Filter: (NOT (subplan)) SubPlan -> Index Scan using p_id_food_index on food f2 (cost=0.00..11288.47 rows=2835 width=177) Index Cond: (p_id = $0) Filter: (id > $1) (9 rows) I appreciate if you have further ideas to troubleshoot this issue. Thank you! On 6/8/05, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > This is a pattern which I've seen many of times. I call it a "best > choice" query -- you can easily match a row from one table against any > of a number of rows in another, the trick is to pick the one that > matters most. I've generally found that I want the query results to > show more than the columns used for making the choice (and there can be > many), which rules out the min/max technique. What works in a pretty > straitforward way, and generally optimizes at least as well as the > alternatives, is to join to the set of candidate rows and add a "not > exists" test to eliminate all but the best choice. > > For your example, I've taken some liberties and added hypothetical > columns from both tables to the result set, to demonstrate how that > works. Feel free to drop them or substitute actual columns as you see > fit. This will work best if there is an index for the food table on > p_id and id. Please let me know whether this works for you. > > select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts > from food f join person p > on f.p_id = p.id > and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id > > f.id) > order by p_id > > Note that this construct works for inner or outer joins and works > regardless of how complex the logic for picking the best choice is. I > think one reason this tends to optimize well is that an EXISTS test can > finish as soon as it finds one matching row. > > -Kevin > > > >>> Junaili Lie <junaili@gmail.com> 06/08/05 2:34 PM >>> > Hi, > I have the following table: > person - primary key id, and some attributes > food - primary key id, foreign key p_id reference to table person. > > table food store all the food that a person is eating. The more recent > food is indicated by the higher food.id. > > I need to find what is the most recent food a person ate for every > person. > The query: > select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group > by f.p_id will work. > But I understand this is not the most efficient way. Is there another > way to rewrite this query? (maybe one that involves order by desc > limit 1) > > Thank you in advance. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
pgsql-performance by date: