Assistance with optimizing query - same SQL, different category_id = Seq Scan - Mailing list pgsql-performance

From Brendan Duddridge
Subject Assistance with optimizing query - same SQL, different category_id = Seq Scan
Date
Msg-id A8C726F2-AE42-4812-9B71-B6CFE68F18B3@clickspace.com
Whole thread Raw
Responses Re: Assistance with optimizing query - same SQL, different category_id = Seq Scan
List pgsql-performance
Hi,

I have a query that generates two different plans when there's only a
change in the category_id used in the query.

The first query has category_id = 1001573 and return 3117 rows from
the category_product table.
The second query has category_id = 1001397 and returns 27889 rows
from the category_product table.

The first query does all access via indexes.
The second query does all access via indexes except for a sequential
scan on the Price table.

Here is the explain analyze for the first query:

explain analyze
select distinct pr.amount
from merchant_product mp,
category_product cp,
price pr
where cp.category_id = 1001573 and
    cp.product_id = mp.product_id and
    cp.product_status_code = 'complete' and
    cp.product_is_active = 'true' and
    mp.is_active = 'true' and
    mp.merchant_product_id = pr.merchant_product_id
order by amount asc;



                    QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------
Unique  (cost=24311.37..24325.11 rows=2748 width=11) (actual
time=277.953..280.844 rows=622 loops=1)
    ->  Sort  (cost=24311.37..24318.24 rows=2748 width=11) (actual
time=277.952..278.490 rows=4007 loops=1)
          Sort Key: pr.amount
          ->  Nested Loop  (cost=0.00..24154.40 rows=2748 width=11)
(actual time=0.295..262.225 rows=4007 loops=1)
                ->  Nested Loop  (cost=0.00..14658.32 rows=2750
width=4) (actual time=0.229..84.908 rows=4007 loops=1)
                      ->  Index Scan using
x_category_product__category_id_fk_idx on category_product cp
(cost=0.00..3054.20 rows=2369 width=4) (actual time=0.136..20.746
rows=2832 loops=1)
                            Index Cond: (category_id = 1001573)
                            Filter: (((product_status_code)::text =
'complete'::text) AND ((product_is_active)::text = 'true'::text))
                      ->  Index Scan using
merchant_product__product_id_fk_idx on merchant_product mp
(cost=0.00..4.89 rows=1 width=8) (actual time=0.019..0.021 rows=1
loops=2832)
                            Index Cond: ("outer".product_id =
mp.product_id)
                            Filter: ((is_active)::text = 'true'::text)
                ->  Index Scan using
price__merchant_product_id_fk_idx on price pr  (cost=0.00..3.44
rows=1 width=15) (actual time=0.042..0.043 rows=1 loops=4007)
                      Index Cond: ("outer".merchant_product_id =
pr.merchant_product_id)
Total runtime: 281.709 ms


Here is the explain analyze for the second (slow) query:

explain analyze
select distinct pr.amount
from merchant_product mp,
category_product cp,
price pr
where cp.category_id = 1001397 and
    cp.product_id = mp.product_id and
    cp.product_status_code = 'complete' and
    cp.product_is_active = 'true' and
    mp.is_active = 'true' and
    mp.merchant_product_id = pr.merchant_product_id
order by amount asc;


                            QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------------------------------------------------
Unique  (cost=106334.48..106452.38 rows=6050 width=11) (actual
time=7140.302..7162.345 rows=2567 loops=1)
    ->  Sort  (cost=106334.48..106393.43 rows=23580 width=11) (actual
time=7140.300..7143.873 rows=26949 loops=1)
          Sort Key: pr.amount
          ->  Hash Join  (cost=77475.88..104621.95 rows=23580
width=11) (actual time=4213.546..7015.639 rows=26949 loops=1)
                Hash Cond: ("outer".merchant_product_id =
"inner".merchant_product_id)
                ->  Seq Scan on price pr  (cost=0.00..20782.51
rows=1225551 width=15) (actual time=0.059..1482.238 rows=1225551
loops=1)
                ->  Hash  (cost=77416.91..77416.91 rows=23590
width=4) (actual time=4212.042..4212.042 rows=26949 loops=1)
                      ->  Merge Join  (cost=22632.74..77416.91
rows=23590 width=4) (actual time=1851.012..4186.067 rows=26949 loops=1)
                            Merge Cond: ("outer".product_id =
"inner".product_id)
                            ->  Index Scan using
merchant_product__product_id_fk_idx on merchant_product mp
(cost=0.00..51365.12 rows=1226085 width=8) (actual
time=0.073..3141.654 rows=1208509 loops=1)
                                  Filter: ((is_active)::text =
'true'::text)
                            ->  Sort  (cost=22632.74..22683.55
rows=20325 width=4) (actual time=507.110..511.076 rows=26949 loops=1)
                                  Sort Key: cp.product_id
                                  ->  Index Scan using
x_category_product__category_id_fk_idx on category_product cp
(cost=0.00..21178.38 rows=20325 width=4) (actual time=0.145..440.113
rows=26949 loops=1)
                                        Index Cond: (category_id =
1001397)
                                        Filter:
(((product_status_code)::text = 'complete'::text) AND
((product_is_active)::text = 'true'::text))
Total runtime: 7172.359 ms


Notice the sequential scan of the Price table? It scanned 1,225,551
rows in the second query.


Do you have any suggestions on how I can optimize the query so both
versions of the query come back fast without doing a sequential scan
on the price table?


Thanks,


____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com



pgsql-performance by date:

Previous
From: Klint Gore
Date:
Subject: Re: performance question (something to do w/ parameterized
Next
From: kah_hang_ang@toray.com.my
Date:
Subject: Re: extremely slow when execute select/delete for certain tables