RE: Performance in subquery - Mailing list pgsql-general

From Culberson, Philip
Subject RE: Performance in subquery
Date
Msg-id A95EFC3B707BD311986C00A0C9E95B6A9DE68C@datmail03.dat.com
Whole thread Raw
In response to Performance in subquery  ("Brian Haney" <brian@ibsystems.com>)
List pgsql-general
Two things come to mind.

1) Make sure you have run VACUUM ANALYZE on your database.

2) Use a join rather than a sub-select, e.g.

    SELECT
        p.prod_name
    FROM
        products p, prod_attr pa
    WHERE
        p.prod_id = pa.prod_id AND
        pa.prod_attr_text LIKE '%Linux%';

A word of caution though...  Because you are using a wildcard in the first
position of your search string ('%Linux%'), you preclude the use of any
index and force a sequential scan!

Hope this helps,

Phil Culberson
DAT Services

-----Original Message-----
From: Brian Haney [mailto:brian@ibsystems.com]
Sent: Monday, April 17, 2000 12:52 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Performance in subquery


I have multivendor product catalog and want to be able to search for
arbitrary text in a table of product attributes.  Two of the several tables
are:

Table    = products
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| prod_id                          | int4 not null default nextval    |
4 |
| vendor_id                        | int4                             |
4 |
| category_id                      | int4                             |
4 |
| prod_name                        | text not null                    |
var |
| prod_description                 | text not null                    |
var |
| prod_price_low                   | numeric                          | 30.6
|
| prod_price_high                  | numeric                          | 30.6
|
| prod_promo                       | text                             |
var |
| prod_promo_url                   | text                             |
var |
| prod_datasht_url                 | text                             |
var |
| prod_buynow_url                  | text                             |
var |
| prod_asp_url                     | text                             |
var |
| prod_active                      | bool default 'false'             |
1 |
| last_modified                    | timestamp                        |
4 |
| prod_demo_url                    | text                             |
var |
| prod_ephone                      | text                             |
var |
+----------------------------------+----------------------------------+-----
--+
Index:    products_pkey

Table    = prod_attr
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| attr_id                          | int4 not null                    |
4 |
| prod_id                          | int4 not null                    |
4 |
| prod_attr_text                   | text not null                    |
var |
+----------------------------------+----------------------------------+-----
--+
Indices:  prod_attr_pkey
          prod_attr_text_idx

The products table has 538 records and the prod_attr table has 7870.

When I enter the query:

SELECT prod_id FROM prod_attr WHERE prod_attr_text LIKE '%Linux%';

it performs quite well to give me a list of the product IDs almost
instantaneously.

But when I query:

select prod_name from products where prod_id in ( SELECT prod_id FROM
prod_attr
WHERE prod_attr_text LIKE '%Linux%');

It takes over 30 seconds to get the results.  Here are the EXPLAINs:

First query:

Seq Scan on prod_attr  (cost=317.84 rows=1 width=4)

Second query:

Seq Scan on products  (cost=31.75 rows=538 width=12)
  SubPlan
    ->  Seq Scan on prod_attr  (cost=317.84 rows=1 width=4)

As you can see, I have created and index for prod_attr
(prod_attr_text), but it has had no discernable effect.

The explains imply to me that the real cost of the second query is
scanning the 7870 records, but then I would expect the second query to
take only slightly longer than the first.  I suspect something else is
causing the second query to be so doggone slow.

Any help would be greatly appreciated.

--
Brian Haney  VP Engineering/CTO
brian@ibsystems.com  Internet Business Systems, Inc.

pgsql-general by date:

Previous
From: "Oelkers, Phil"
Date:
Subject: test nomail
Next
From: "Chris Carbaugh"
Date:
Subject: RE: excell to postgres