Re: Help with rewriting query - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Help with rewriting query
Date
Msg-id s2a9a898.001@gwmta.wicourts.gov
Whole thread Raw
In response to Help with rewriting query  (Junaili Lie <junaili@gmail.com>)
Responses Re: Help with rewriting query  (Tobias Brox <tobias@nordicbet.com>)
List pgsql-performance
With your current (apparently well-normalized) schema, I don't see how
you can get a better query plan than that.  There may be something you
can do in terms of memory configuration to get it to execute somewhat
faster, but the only way to make it really fast is to de-normalize.
This is something which is often necessary for performance.

If you add a column to the person table for "last_food_id" and triggers
to maintain it when the food table is modified, voila!  You have a
simple and fast way to get the results you want.

-Kevin


>>> Junaili Lie <junaili@gmail.com> 06/09/05 8:30 PM >>>
Hi Kevin,
Thanks for the reply.
I tried that query. It definately faster, but not fast enough (took
around 50 second to complete).
I have around 2.5 million on food and 1000 on person.
Here is the query plan:
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..11662257.52 rows=1441579 width=16)
   Merge Cond: ("outer".id = "inner".p_id)
   ->  Index Scan using person_pkey on person p  (cost=0.00..25.17
rows=569 width=8)
   ->  Index Scan using p_id_food_index on food f
(cost=0.00..11644211.28 rows=1441579 width=16)
         Filter: (NOT (subplan))
         SubPlan
           ->  Index Scan using p_id_food_index on food f2
(cost=0.00..11288.47 rows=2835 width=177)
                 Index Cond: (p_id = $0)
                 Filter: (id > $1)
(9 rows)

I appreciate if you have further ideas to troubleshoot this issue.
Thank you!

On 6/8/05, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> This is a pattern which I've seen many of times.  I call it a "best
> choice" query -- you can easily match a row from one table against any
> of a number of rows in another, the trick is to pick the one that
> matters most.  I've generally found that I want the query results to
> show more than the columns used for making the choice (and there can
be
> many), which rules out the min/max technique.  What works in a pretty
> straitforward way, and generally optimizes at least as well as the
> alternatives, is to join to the set of candidate rows and add a "not
> exists" test to eliminate all but the best choice.
>
> For your example, I've taken some liberties and added hypothetical
> columns from both tables to the result set, to demonstrate how that
> works.  Feel free to drop them or substitute actual columns as you see
> fit.  This will work best if there is an index for the food table on
> p_id and id.  Please let me know whether this works for you.
>
> select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts
> from food f join person p
> on f.p_id = p.id
> and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id
>
> f.id)
> order by p_id
>
> Note that this construct works for inner or outer joins and works
> regardless of how complex the logic for picking the best choice is.  I
> think one reason this tends to optimize well is that an EXISTS test
can
> finish as soon as it finds one matching row.
>
> -Kevin
>
>
> >>> Junaili Lie <junaili@gmail.com> 06/08/05 2:34 PM >>>
> Hi,
> I have the following table:
> person - primary key id, and some attributes
> food - primary key id, foreign key p_id reference to table person.
>
> table food store all the food that a person is eating. The more recent
> food is indicated by the higher food.id.
>
> I need to find what is the most recent food a person ate for every
> person.
> The query:
> select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> by f.p_id will work.
> But I understand this is not the most efficient way. Is there another
> way to rewrite this query? (maybe one that involves order by desc
> limit 1)
>
> Thank you in advance.
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>      joining column's datatypes do not match
>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


pgsql-performance by date:

Previous
From: Tobias Brox
Date:
Subject: Re: faster search
Next
From: Steve Atkins
Date:
Subject: Re: faster search