Re: Query using SeqScan instead of IndexScan - Mailing list pgsql-performance
From | Brendan Duddridge |
---|---|
Subject | Re: Query using SeqScan instead of IndexScan |
Date | |
Msg-id | EBEB7ACF-84A8-4FCF-A481-8C087FF372D1@clickspace.com Whole thread Raw |
In response to | Re: Query using SeqScan instead of IndexScan ("Jim C. Nasby" <jnasby@pervasive.com>) |
Responses |
Re: Query using SeqScan instead of IndexScan
|
List | pgsql-performance |
Hi Jim, I'm not quite sure what you mean by the correlation of category_id? The category_id is part of a compound primary key in the category_product table. The primary key on category_product is (category_id, product_id). Here's the definitions of the two tables involved in the join: Table "public.category_product" Column | Type | Modifiers ---------------------+----------------------+----------- category_id | integer | not null product_id | integer | not null en_name_sort_order | integer | fr_name_sort_order | integer | merchant_sort_order | integer | price_sort_order | integer | merchant_count | integer | is_active | character varying(5) | Indexes: "x_category_product_pk" PRIMARY KEY, btree (category_id, product_id) "category_product__is_active_idx" btree (is_active) "category_product__merchant_sort_order_idx" btree (merchant_sort_order) "x_category_product__category_id_fk_idx" btree (category_id) CLUSTER "x_category_product__product_id_fk_idx" btree (product_id) Foreign-key constraints: "x_category_product_category_fk" FOREIGN KEY (category_id) REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED "x_category_product_product_fk" FOREIGN KEY (product_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED Table "public.product_attribute_value" Column | Type | Modifiers ----------------------------+-----------------------+----------- attribute_id | integer | not null attribute_unit_id | integer | attribute_value_id | integer | boolean_value | character varying(5) | decimal_value | numeric(30,10) | product_attribute_value_id | integer | not null product_id | integer | not null product_reference_id | integer | status_code | character varying(32) | Indexes: "product_attribute_value_pk" PRIMARY KEY, btree (product_attribute_value_id) "product_attribute_value__attribute_id_fk_idx" btree (attribute_id) "product_attribute_value__attribute_unit_id_fk_idx" btree (attribute_unit_id) "product_attribute_value__attribute_value_id_fk_idx" btree (attribute_value_id) "product_attribute_value__product_id_fk_idx" btree (product_id) "product_attribute_value__product_reference_id_fk_idx" btree (product_reference_id) Foreign-key constraints: "product_attribute_value_attribute_fk" FOREIGN KEY (attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE INITIALLY DEFERRED "product_attribute_value_attributeunit_fk" FOREIGN KEY (attribute_unit_id) REFERENCES attribute_unit(attribute_unit_id) DEFERRABLE INITIALLY DEFERRED "product_attribute_value_attributevalue_fk" FOREIGN KEY (attribute_value_id) REFERENCES attribute_value(attribute_value_id) DEFERRABLE INITIALLY DEFERRED "product_attribute_value_product_fk" FOREIGN KEY (product_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED "product_attribute_value_productreference_fk" FOREIGN KEY (product_reference_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED Not sure if that helps answer your question, but the query is pretty slow. Sometimes it takes 5 - 15 seconds depending on the category_id specified. Thanks, ____________________________________________________________________ Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote: > What's the correlation of category_id? The current index scan cost > estimator places a heavy penalty on anything with a correlation much > below about 90%. > > On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote: >> Hi, >> >> I have a query that is using a sequential scan instead of an index >> scan. I've turned off sequential scans and it is in fact faster with >> the index scan. >> >> Here's my before and after. >> >> Before: >> >> ssdev=# SET enable_seqscan TO DEFAULT; >> ssdev=# explain analyze select cp.product_id >> from category_product cp, product_attribute_value pav >> where cp.category_id = 1001082 and cp.product_id = >> pav.product_id; >> >> >> QUERY PLAN >> --------------------------------------------------------------------- >> --- >> --------------------------------------------------------------------- >> --- >> ------------------------------ >> Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual >> time=4.521..2580.520 rows=19695 loops=1) >> Hash Cond: ("outer".product_id = "inner".product_id) >> -> Seq Scan on product_attribute_value pav (cost=0.00..40127.12 >> rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 >> loops=1) >> -> Hash (cost=23.10..23.10 rows=970 width=4) (actual >> time=2.267..2.267 rows=1140 loops=1) >> -> Index Scan using x_category_product__category_id_fk_idx >> on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual >> time=0.122..1.395 rows=1140 loops=1) >> Index Cond: (category_id = 1001082) >> Total runtime: 2584.221 ms >> (7 rows) >> >> >> After: >> >> ssdev=# SET enable_seqscan TO false; >> ssdev=# explain analyze select cp.product_id >> from category_product cp, product_attribute_value pav >> where cp.category_id = 1001082 and cp.product_id = >> pav.product_id; >> >> >> QUERY PLAN >> --------------------------------------------------------------------- >> --- >> --------------------------------------------------------------------- >> --- >> ------------------------------------- >> Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual >> time=0.373..71.177 rows=19695 loops=1) >> -> Index Scan using x_category_product__category_id_fk_idx on >> category_product cp (cost=0.00..23.10 rows=970 width=4) (actual >> time=0.129..1.438 rows=1140 loops=1) >> Index Cond: (category_id = 1001082) >> -> Index Scan using product_attribute_value__product_id_fk_idx >> on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4) >> (actual time=0.016..0.053 rows=17 loops=1140) >> Index Cond: ("outer".product_id = pav.product_id) >> Total runtime: 74.747 ms >> (6 rows) >> >> There's quite a big difference in speed there. 2584.221 ms vs. 74.747 >> ms. >> >> Any ideas what I can do to improve this without turning sequential >> scanning off? >> >> Thanks, >> >> ____________________________________________________________________ >> Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com >> >> ClickSpace Interactive Inc. >> Suite L100, 239 - 10th Ave. SE >> Calgary, AB T2G 0V9 >> >> http://www.clickspace.com >> > > > > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly >
Attachment
pgsql-performance by date: