Re: Join query on 1M row table slow - Mailing list pgsql-general

From CSN
Subject Re: Join query on 1M row table slow
Date
Msg-id 20040210214820.16165.qmail@web40605.mail.yahoo.com
Whole thread Raw
In response to Re: Join query on 1M row table slow  ("scott.marlowe" <scott.marlowe@ihs.com>)
Responses Re: Join query on 1M row table slow
List pgsql-general
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> On Tue, 10 Feb 2004, CSN wrote:
>
> > I have a pretty simple select query that joins a
> table
> > (p) with 125K rows with another table (pc) with
> almost
> > one million rows:
> >
> > select p.*
> > from product_categories pc
> > inner join products p
> > on pc.product_id = p.id
> > where pc.category_id = $category_id
> > order by p.title
> > limit 25
> > offset $offset
> >
> > The query usually takes about five seconds to
> execute
> > (all other PG queries perform fast enough). I have
> > indexes on everything needed, and EXPLAIN shows
> > they're being used. Is there anything else I can
> do to
> > improve performance - such as tweaking some
> settings
> > in the config?
>
> The problem is that in order to do an offset / limit
> on such a set,
> postgresql is gonna have to generate $offset + limit
> of the joined set.
>
> So, it's quite likely that it's generating the whole
> set first.
>
> It also looks odd having a select p.* from
> product_cat pc, but not
> selecting anything from the pc table.
>
> Could this be rewritten to something like
>
> select p.* from products p where p.id in (select
> product_id from product_categories pc where
> pc.category_id = $category_id)
> order by p.title limit 25 offset $offset
>
> ?  Or is that equivalent?
>

I think that probably improves things (lower cost? -
see my other post):

explain select p.* from products p where p.id in (
select product_id from product_categories pc where
pc.category_id = 1016) order by p.title limit 25
offset 0;

  QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=4282.18..4282.24 rows=25 width=290)
   ->  Sort  (cost=4282.18..4282.46 rows=111
width=290)
         Sort Key: p.title
         ->  Nested Loop  (cost=3609.75..4278.41
rows=111 width=290)
               ->  HashAggregate
(cost=3609.75..3609.75 rows=111 width=4)
                     ->  Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4)
                           Index Cond: (category_id =
1016)
               ->  Index Scan using pkey_products_id
on products p  (cost=0.00..6.01 rows=1 width=290)
                     Index Cond: (p.id =
"outer".product_id)
(9 rows)


I figured the limit/offset was probably causing the
problem. What's weird is that when the same query is
executed again, it seems much faster - some sort of
caching maybe?

(table pc is just product_id <=> category_id - I don't
really need the category_id)

CSN


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

pgsql-general by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: Join query on 1M row table slow
Next
From: "scott.marlowe"
Date:
Subject: Re: Join query on 1M row table slow