Home > mailing lists

Re: Help with rewriting query - Mailing list pgsql-performance

From	Junaili Lie
Subject	Re: Help with rewriting query
Date	June 9, 2005 22:30:40
Msg-id	8d04ce9905060918306fc4afb8@mail.gmail.com Whole thread Raw
In response to	Re: Help with rewriting query ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List	pgsql-performance

Tree view

Hi Kevin,
Thanks for the reply.
I tried that query. It definately faster, but not fast enough (took
around 50 second to complete).
I have around 2.5 million on food and 1000 on person.
Here is the query plan:
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..11662257.52 rows=1441579 width=16)
   Merge Cond: ("outer".id = "inner".p_id)
   ->  Index Scan using person_pkey on person p  (cost=0.00..25.17
rows=569 width=8)
   ->  Index Scan using p_id_food_index on food f
(cost=0.00..11644211.28 rows=1441579 width=16)
         Filter: (NOT (subplan))
         SubPlan
           ->  Index Scan using p_id_food_index on food f2
(cost=0.00..11288.47 rows=2835 width=177)
                 Index Cond: (p_id = $0)
                 Filter: (id > $1)
(9 rows)

I appreciate if you have further ideas to troubleshoot this issue.
Thank you!

On 6/8/05, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> This is a pattern which I've seen many of times.  I call it a "best
> choice" query -- you can easily match a row from one table against any
> of a number of rows in another, the trick is to pick the one that
> matters most.  I've generally found that I want the query results to
> show more than the columns used for making the choice (and there can be
> many), which rules out the min/max technique.  What works in a pretty
> straitforward way, and generally optimizes at least as well as the
> alternatives, is to join to the set of candidate rows and add a "not
> exists" test to eliminate all but the best choice.
>
> For your example, I've taken some liberties and added hypothetical
> columns from both tables to the result set, to demonstrate how that
> works.  Feel free to drop them or substitute actual columns as you see
> fit.  This will work best if there is an index for the food table on
> p_id and id.  Please let me know whether this works for you.
>
> select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts
> from food f join person p
> on f.p_id = p.id
> and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id >
> f.id)
> order by p_id
>
> Note that this construct works for inner or outer joins and works
> regardless of how complex the logic for picking the best choice is.  I
> think one reason this tends to optimize well is that an EXISTS test can
> finish as soon as it finds one matching row.
>
> -Kevin
>
>
> >>> Junaili Lie <junaili@gmail.com> 06/08/05 2:34 PM >>>
> Hi,
> I have the following table:
> person - primary key id, and some attributes
> food - primary key id, foreign key p_id reference to table person.
>
> table food store all the food that a person is eating. The more recent
> food is indicated by the higher food.id.
>
> I need to find what is the most recent food a person ate for every
> person.
> The query:
> select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> by f.p_id will work.
> But I understand this is not the most efficient way. Is there another
> way to rewrite this query? (maybe one that involves order by desc
> limit 1)
>
> Thank you in advance.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>      joining column's datatypes do not match
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

pgsql-performance by date:

From: Junaili Lie
Date: 09 June 2005, 22:26:13
Subject: Re: Help with rewriting query

From: Yann Michel
Date: 10 June 2005, 01:57:09
Subject: Re: postgresql.conf runtime statistics default

Re: Help with rewriting query - Mailing list pgsql-performance

Previous

Next