Performance in subquery - Mailing list pgsql-general
From | Brian Haney |
---|---|
Subject | Performance in subquery |
Date | |
Msg-id | 002601bfa8a6$70aa26e0$8101a8c0@specter.fresno.cybernaut.com Whole thread Raw |
List | pgsql-general |
I have multivendor product catalog and want to be able to search for arbitrary text in a table of product attributes. Two of the several tables are: Table = products +----------------------------------+---------------------------------- +-------+ | Field | Type | Length| +----------------------------------+---------------------------------- +-------+ | prod_id | int4 not null default nextval | 4 | | vendor_id | int4 | 4 | | category_id | int4 | 4 | | prod_name | text not null | var | | prod_description | text not null | var | | prod_price_low | numeric | 30.6 | | prod_price_high | numeric | 30.6 | | prod_promo | text | var | | prod_promo_url | text | var | | prod_datasht_url | text | var | | prod_buynow_url | text | var | | prod_asp_url | text | var | | prod_active | bool default 'false' | 1 | | last_modified | timestamp | 4 | | prod_demo_url | text | var | | prod_ephone | text | var | +----------------------------------+---------------------------------- +-------+ Index: products_pkey Table = prod_attr +----------------------------------+---------------------------------- +-------+ | Field | Type | Length| +----------------------------------+---------------------------------- +-------+ | attr_id | int4 not null | 4 | | prod_id | int4 not null | 4 | | prod_attr_text | text not null | var | +----------------------------------+---------------------------------- +-------+ Indices: prod_attr_pkey prod_attr_text_idx The products table has 538 records and the prod_attr table has 7870. When I enter the query: SELECT prod_id FROM prod_attr WHERE prod_attr_text LIKE '%Linux%'; it performs quite well to give me a list of the product IDs almost instantaneously. But when I query: select prod_name from products where prod_id in ( SELECT prod_id FROM prod_attr WHERE prod_attr_text LIKE '%Linux%'); It takes over 30 seconds to get the results. Here are the EXPLAINs: First query: Seq Scan on prod_attr (cost=317.84 rows=1 width=4) Second query: Seq Scan on products (cost=31.75 rows=538 width=12) SubPlan -> Seq Scan on prod_attr (cost=317.84 rows=1 width=4) As you can see, I have created and index for prod_attr (prod_attr_text), but it has had no discernable effect. The explains imply to me that the real cost of the second query is scanning the 7870 records, but then I would expect the second query to take only slightly longer than the first. I suspect something else is causing the second query to be so doggone slow. Any help would be greatly appreciated. -- Brian Haney VP Engineering/CTO brian@ibsystems.com Internet Business Systems, Inc.
pgsql-general by date: