RE: Performance in subquery - Mailing list pgsql-general
From | Culberson, Philip |
---|---|
Subject | RE: Performance in subquery |
Date | |
Msg-id | A95EFC3B707BD311986C00A0C9E95B6A9DE68C@datmail03.dat.com Whole thread Raw |
In response to | Performance in subquery ("Brian Haney" <brian@ibsystems.com>) |
List | pgsql-general |
Two things come to mind. 1) Make sure you have run VACUUM ANALYZE on your database. 2) Use a join rather than a sub-select, e.g. SELECT p.prod_name FROM products p, prod_attr pa WHERE p.prod_id = pa.prod_id AND pa.prod_attr_text LIKE '%Linux%'; A word of caution though... Because you are using a wildcard in the first position of your search string ('%Linux%'), you preclude the use of any index and force a sequential scan! Hope this helps, Phil Culberson DAT Services -----Original Message----- From: Brian Haney [mailto:brian@ibsystems.com] Sent: Monday, April 17, 2000 12:52 PM To: pgsql-general@postgresql.org Subject: [GENERAL] Performance in subquery I have multivendor product catalog and want to be able to search for arbitrary text in a table of product attributes. Two of the several tables are: Table = products +----------------------------------+----------------------------------+----- --+ | Field | Type | Length| +----------------------------------+----------------------------------+----- --+ | prod_id | int4 not null default nextval | 4 | | vendor_id | int4 | 4 | | category_id | int4 | 4 | | prod_name | text not null | var | | prod_description | text not null | var | | prod_price_low | numeric | 30.6 | | prod_price_high | numeric | 30.6 | | prod_promo | text | var | | prod_promo_url | text | var | | prod_datasht_url | text | var | | prod_buynow_url | text | var | | prod_asp_url | text | var | | prod_active | bool default 'false' | 1 | | last_modified | timestamp | 4 | | prod_demo_url | text | var | | prod_ephone | text | var | +----------------------------------+----------------------------------+----- --+ Index: products_pkey Table = prod_attr +----------------------------------+----------------------------------+----- --+ | Field | Type | Length| +----------------------------------+----------------------------------+----- --+ | attr_id | int4 not null | 4 | | prod_id | int4 not null | 4 | | prod_attr_text | text not null | var | +----------------------------------+----------------------------------+----- --+ Indices: prod_attr_pkey prod_attr_text_idx The products table has 538 records and the prod_attr table has 7870. When I enter the query: SELECT prod_id FROM prod_attr WHERE prod_attr_text LIKE '%Linux%'; it performs quite well to give me a list of the product IDs almost instantaneously. But when I query: select prod_name from products where prod_id in ( SELECT prod_id FROM prod_attr WHERE prod_attr_text LIKE '%Linux%'); It takes over 30 seconds to get the results. Here are the EXPLAINs: First query: Seq Scan on prod_attr (cost=317.84 rows=1 width=4) Second query: Seq Scan on products (cost=31.75 rows=538 width=12) SubPlan -> Seq Scan on prod_attr (cost=317.84 rows=1 width=4) As you can see, I have created and index for prod_attr (prod_attr_text), but it has had no discernable effect. The explains imply to me that the real cost of the second query is scanning the 7870 records, but then I would expect the second query to take only slightly longer than the first. I suspect something else is causing the second query to be so doggone slow. Any help would be greatly appreciated. -- Brian Haney VP Engineering/CTO brian@ibsystems.com Internet Business Systems, Inc.
pgsql-general by date: