Re: Join query on 1M row table slow - Mailing list pgsql-general

From CSN
Subject Re: Join query on 1M row table slow
Date
Msg-id 20040210223727.29023.qmail@web40605.mail.yahoo.com
Whole thread Raw
In response to Re: Join query on 1M row table slow  ("scott.marlowe" <scott.marlowe@ihs.com>)
Responses Re: Join query on 1M row table slow  ("scott.marlowe" <scott.marlowe@ihs.com>)
List pgsql-general
I disabled enable_hashagg and enable_nestloop. Appears
to have made both queries worse :(

QUERY 1:

=> explain analyze select p.* from product_categories
pc inner join products p on pc.product_id = p.id where
pc.category_id = 1016 order by p.title limit 25 offset
10000;

                         QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=65999.78..65999.78 rows=1 width=290)
(actual time=7736.029..7736.029 rows=0 loops=1)
   ->  Sort  (cost=65997.31..65999.78 rows=986
width=290) (actual time=7723.794..7730.352 rows=2358
loops=1)
         Sort Key: p.title
         ->  Merge Join  (cost=65306.35..65948.28
rows=986 width=290) (actual time=7028.790..7614.223
rows=2358 loops=1)
               Merge Cond: ("outer".product_id =
"inner".id)
               ->  Sort  (cost=3656.31..3658.78
rows=986 width=4) (actual time=102.115..105.357
rows=2358 loops=1)
                     Sort Key: pc.product_id
                     ->  Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.349..94.173 rows=2358 loops=1)
                           Index Cond: (category_id =
1016)
               ->  Sort  (cost=61650.04..61963.62
rows=125430 width=290) (actual time=6926.394..7272.130
rows=124521 loops=1)
                     Sort Key: p.id
                     ->  Seq Scan on products p
(cost=0.00..6638.30 rows=125430 width=290) (actual
time=0.102..2855.358 rows=124753 loops=1)
 Total runtime: 8003.067 ms
(13 rows)


QUERY 2:

=> explain analyze select p.* from products p where
p.id in ( select product_id from product_categories pc
where pc.category_id = 1016) order by p.title limit 25
offset 0;

                               QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10931.85..10931.91 rows=25 width=290)
(actual time=3667.396..3667.526 rows=25 loops=1)
   ->  Sort  (cost=10931.85..10932.13 rows=111
width=290) (actual time=3667.384..3667.453 rows=25
loops=1)
         Sort Key: p.title
         ->  Hash Join  (cost=3661.52..10928.08
rows=111 width=290) (actual time=111.198..1615.324
rows=2358 loops=1)
               Hash Cond: ("outer".id =
"inner".product_id)
               ->  Seq Scan on products p
(cost=0.00..6638.30 rows=125430 width=290) (actual
time=0.113..1039.900 rows=124753 loops=1)
               ->  Hash  (cost=3661.24..3661.24
rows=111 width=4) (actual time=110.932..110.932 rows=0
loops=1)
                     ->  Unique
(cost=3656.31..3661.24 rows=111 width=4) (actual
time=97.255..106.798 rows=2358 loops=1)
                           ->  Sort
(cost=3656.31..3658.78 rows=986 width=4) (actual
time=97.247..99.998 rows=2358 loops=1)
                                 Sort Key:
pc.product_id
                                 ->  Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.327..88.436 rows=2358 loops=1)
                                       Index Cond:
(category_id = 1016)
 Total runtime: 3669.479 ms
(13 rows)


> upping the analyze target on those two tables may
> help a bit too.

How exactly do I do that?

SELECT * from thanks limit 1000
;)

CSN



--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> Well, it looks like the number of rows estimate for
> the nested loop in the
> first query and the hash agg in the second are off
> by a factor 3 for the
> first query, and a factor of 20 for the second.  Try
> running number 1 with
> set enable_nestloop = off
> and see if the first one gets faster.
>
> You might also wanna try turning off hash aggregate
> on the second one and
> see how that works.
>
> upping the analyze target on those two tables may
> help a bit too.
>
> On Tue, 10 Feb 2004, CSN wrote:
>
> >
> > Doh! Yeah, now I remember ;)
> >
> > QUERY 1:
> >
> > => explain analyze select p.* from
> product_categories
> > pc inner join products p on pc.product_id = p.id
> where
> > pc.category_id = 1016 order by p.title limit 25
> offset
> > 0;
> >
>
> >                        QUERY PLAN
>
> >
> >
>

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >  Limit  (cost=9595.99..9596.05 rows=25 width=290)
> > (actual time=514.808..514.942 rows=25 loops=1)
> >    ->  Sort  (cost=9595.99..9598.45 rows=986
> > width=290) (actual time=514.794..514.868 rows=25
> > loops=1)
> >          Sort Key: p.title
> >          ->  Nested Loop  (cost=0.00..9546.96
> rows=986
> > width=290) (actual time=0.672..421.732 rows=2358
> > loops=1)
> >                ->  Index Scan using
> idx_pc_category_id
> > on product_categories pc  (cost=0.00..3607.28
> rows=986
> > width=4) (actual time=0.343..125.762 rows=2358
> > loops=1)
> >                      Index Cond: (category_id =
> 1016)
> >                ->  Index Scan using
> pkey_products_id
> > on products p  (cost=0.00..6.01 rows=1 width=290)
> > (actual time=0.075..0.083 rows=1 loops=2358)
> >                      Index Cond:
> ("outer".product_id =
> > p.id)
> >  Total runtime: 516.174 ms
> > (9 rows)
> >
> >
> > QUERY 2:
> >
> > => explain analyze select p.* from products p
> where
> > p.id in ( select product_id from
> product_categories pc
> > where pc.category_id = 1016) order by p.title
> limit 25
> > offset 0;
> >
>
> >                          QUERY PLAN
>
> >
> >
>

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >  Limit  (cost=4282.18..4282.24 rows=25 width=290)
> > (actual time=447.852..447.979 rows=25 loops=1)
> >    ->  Sort  (cost=4282.18..4282.46 rows=111
> > width=290) (actual time=447.836..447.904 rows=25
> > loops=1)
> >          Sort Key: p.title
> >          ->  Nested Loop  (cost=3609.75..4278.41
> > rows=111 width=290) (actual time=104.256..358.182
> > rows=2358 loops=1)
> >                ->  HashAggregate
> > (cost=3609.75..3609.75 rows=111 width=4) (actual
> > time=103.922..114.836 rows=2358 loops=1)
> >                      ->  Index Scan using
> > idx_pc_category_id on product_categories pc
> > (cost=0.00..3607.28 rows=986 width=4) (actual
> > time=0.401..92.253 rows=2358 loops=1)
> >                            Index Cond:
> (category_id =
> > 1016)
> >                ->  Index Scan using
> pkey_products_id
> > on products p  (cost=0.00..6.01 rows=1 width=290)
> > (actual time=0.069..0.076 rows=1 loops=2358)
> >                      Index Cond: (p.id =
> > "outer".product_id)
> >  Total runtime: 449.370 ms
> > (10 rows)
> >
> >
> > -CSN
> >
> >
> > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> > > On Tue, 10 Feb 2004, CSN wrote:
> > >
> > > > > 2. Vacuum analyze the tables concerned and
> post
> > > the
> > > > > output of EXPLAIN ANALYSE
> > > > > with your query.
> > > >
> > > > => explain analyze;
> > > >
> > > > results in:
> > > >
> > > > ERROR:  syntax error at or near ";" at
> character
> > > 16
> > >
> > > No silly.  you do:
> > >
> > > explain analyze select ... (rest of the
> query...)
> > >
> > > and it runs the query and tells you how long
> each
> > > bit took and what it
> > > THOUGHT it would get back in terms of number of
> rows
> > > and what it actually
> > > got back.
> > >
> > > Let us know...
> > >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > Yahoo! Finance: Get your refund fast by filing
> online.
> > http://taxes.yahoo.com/filing.html
> >
>


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

pgsql-general by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: Join query on 1M row table slow
Next
From: "scott.marlowe"
Date:
Subject: Re: Join query on 1M row table slow