IN subselects and index / seq_scan question.... - Mailing list pgsql-admin

From Marc Mitchell
Subject IN subselects and index / seq_scan question....
Date
Msg-id 013901c28f62$a7379580$7c01050a@marcmdelltop
Whole thread Raw
In response to Re: gmake check runs just 13 tests instead of 77  (Andrew Sullivan <andrew@libertyrms.info>)
Responses Re: IN subselects and index / seq_scan question....  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-admin
I am having an optimization problem with queries that include IN clauses
and subselects.

xxx=> select version();
                           version
-------------------------------------------------------------
 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96


The basic query I have is:

UPDATE my_table
SET some_col=some_value
WHERE my_table.foreign_key_with_an_index IN (
    SELECT primary_key
    FROM my_other_table
    WHERE my_other_table.some_indexed_column = some_other_value);

The problem is that even though <my_table> has an index on
<my_table.foreign_key_with_an_index>, an EXPLAIN reports and performance
supports the fact that the server performs a sequential scan on my_table
instead of an indexed scan on the column.  This is in contrast to the
query:

UPDATE my_table
SET some_col=some_value
WHERE my_table.foreign_key_with_an_index IN (val1,val2,val3...)

which will show an indexed scan and accordingly good performance.

Having done digging within the online user docs, including
http://www.ca.postgresql.org/docs/faq-english.html#4.22, I'm not sure
whether this is an issue of Postgres always wanting to seq_scan when faced
with IN clauses and subselects or if I have a stats problem where the
optimizer thinks my subselect will return hundreds of rows (even though I
know it's never more than 4) and thus opts for a single seq rather than
100s of random page accesses to go from index to base table.

If it is the former, then I've got an issue with the optimizers
capabilibilities and will be looking for info that something within a later
release will makes things better.

If it is the latter, I need to continue my learning journey into the
optimizer, Analyze and statistics.

Can someone shed light as to which path I should be following?

Marc Mitchell - Senior Application Architect
Enterprise Information Solutions, Inc.
Downers Grove, IL 60515
marcm@eisolution.com




pgsql-admin by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: gmake check runs just 13 tests instead of 77
Next
From: Graham Wilson
Date:
Subject: Confused about user permissions and pg_hba.conf