Re: Join query on 1M row table slow - Mailing list pgsql-general
From | CSN |
---|---|
Subject | Re: Join query on 1M row table slow |
Date | |
Msg-id | 20040210214820.16165.qmail@web40605.mail.yahoo.com Whole thread Raw |
In response to | Re: Join query on 1M row table slow ("scott.marlowe" <scott.marlowe@ihs.com>) |
Responses |
Re: Join query on 1M row table slow
|
List | pgsql-general |
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Tue, 10 Feb 2004, CSN wrote: > > > I have a pretty simple select query that joins a > table > > (p) with 125K rows with another table (pc) with > almost > > one million rows: > > > > select p.* > > from product_categories pc > > inner join products p > > on pc.product_id = p.id > > where pc.category_id = $category_id > > order by p.title > > limit 25 > > offset $offset > > > > The query usually takes about five seconds to > execute > > (all other PG queries perform fast enough). I have > > indexes on everything needed, and EXPLAIN shows > > they're being used. Is there anything else I can > do to > > improve performance - such as tweaking some > settings > > in the config? > > The problem is that in order to do an offset / limit > on such a set, > postgresql is gonna have to generate $offset + limit > of the joined set. > > So, it's quite likely that it's generating the whole > set first. > > It also looks odd having a select p.* from > product_cat pc, but not > selecting anything from the pc table. > > Could this be rewritten to something like > > select p.* from products p where p.id in (select > product_id from product_categories pc where > pc.category_id = $category_id) > order by p.title limit 25 offset $offset > > ? Or is that equivalent? > I think that probably improves things (lower cost? - see my other post): explain select p.* from products p where p.id in ( select product_id from product_categories pc where pc.category_id = 1016) order by p.title limit 25 offset 0; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Limit (cost=4282.18..4282.24 rows=25 width=290) -> Sort (cost=4282.18..4282.46 rows=111 width=290) Sort Key: p.title -> Nested Loop (cost=3609.75..4278.41 rows=111 width=290) -> HashAggregate (cost=3609.75..3609.75 rows=111 width=4) -> Index Scan using idx_pc_category_id on product_categories pc (cost=0.00..3607.28 rows=986 width=4) Index Cond: (category_id = 1016) -> Index Scan using pkey_products_id on products p (cost=0.00..6.01 rows=1 width=290) Index Cond: (p.id = "outer".product_id) (9 rows) I figured the limit/offset was probably causing the problem. What's weird is that when the same query is executed again, it seems much faster - some sort of caching maybe? (table pc is just product_id <=> category_id - I don't really need the category_id) CSN __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
pgsql-general by date: