Re: Help with rewriting query - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Help with rewriting query
Date
Msg-id s2a778e0.061@gwmta.wicourts.gov
Whole thread Raw
In response to Help with rewriting query  (Junaili Lie <junaili@gmail.com>)
Responses Re: Help with rewriting query
List pgsql-performance
This is a pattern which I've seen many of times.  I call it a "best
choice" query -- you can easily match a row from one table against any
of a number of rows in another, the trick is to pick the one that
matters most.  I've generally found that I want the query results to
show more than the columns used for making the choice (and there can be
many), which rules out the min/max technique.  What works in a pretty
straitforward way, and generally optimizes at least as well as the
alternatives, is to join to the set of candidate rows and add a "not
exists" test to eliminate all but the best choice.

For your example, I've taken some liberties and added hypothetical
columns from both tables to the result set, to demonstrate how that
works.  Feel free to drop them or substitute actual columns as you see
fit.  This will work best if there is an index for the food table on
p_id and id.  Please let me know whether this works for you.

select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts
from food f join person p
on f.p_id = p.id
and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id >
f.id)
order by p_id

Note that this construct works for inner or outer joins and works
regardless of how complex the logic for picking the best choice is.  I
think one reason this tends to optimize well is that an EXISTS test can
finish as soon as it finds one matching row.

-Kevin


>>> Junaili Lie <junaili@gmail.com> 06/08/05 2:34 PM >>>
Hi,
I have the following table:
person - primary key id, and some attributes
food - primary key id, foreign key p_id reference to table person.

table food store all the food that a person is eating. The more recent
food is indicated by the higher food.id.

I need to find what is the most recent food a person ate for every
person.
The query:
select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
by f.p_id will work.
But I understand this is not the most efficient way. Is there another
way to rewrite this query? (maybe one that involves order by desc
limit 1)

Thank you in advance.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
      joining column's datatypes do not match


pgsql-performance by date:

Previous
From: "Steve Pollard"
Date:
Subject: Re: Importing from pg_dump slow, low Disk IO
Next
From: Shanmugasundaram Doraisamy
Date:
Subject: How to find the size of a database - reg.