Re: Avoiding sequential scans with OR join condition - Mailing list pgsql-general

From Janning Vygen
Subject Re: Avoiding sequential scans with OR join condition
Date
Msg-id 200410161426.53591.vygen@gmx.de
Whole thread Raw
In response to Avoiding sequential scans with OR join condition  (Mike Mascari <mascarm@mascari.com>)
List pgsql-general
Am Samstag, 16. Oktober 2004 07:23 schrieb Mike Mascari:
> Hello. I have a query like:
>
> SELECT big_table.*
> FROM little_table, big_table
> WHERE little_table.x = 10 AND
> little_table.y IN (big_table.y1, big_table.y2);
>
> I have indexes on both big_table.y1 and big_table.y2 and on
> little_table.x and little_table.y. The result is a sequential scan of
> big_table. In order to prevent this,

Maybe the postgres planner decided to choose a seq scan because the planner
thinks it is faster, and often it is right. Did you vacuum analyze before?

try:
VACCUM ANALYZE;
SET enable_seq_scan to off;
EXPLAIN ANALYZE <your query>
SET enable_seq_scan to on;
EXPLAIN ANALYZE <your query>

you will see why postgres planner did choose a seq scan and if it was right to
do so (but never disable seq scan on production environment, not even for one
query. you do not want it.)

(i hope syntax is correct otherwise consult the manual)

> I've rewritten the query as:
> SELECT big_table.*
> FROM little_table, big_table
> WHERE little_table.x = 10 AND
> little_table.y = big_table.y1
>   UNION
> SELECT big_table.*
> FROM little_table, big_table
> WHERE little_table.x = 10 AND
> little_table.y = big_table.y2
>
> which does allow an index scan, but suffers from two separate queries
> along with a unique sort, which, from the data, represents 90% of the
> tuples returned by both queries.

this is the reason it seems why postgres choose a seq scan in the first query.
if it has to scan 90% of data anyway, it is faster than doing two index
lookups before.

> Is there any way to write the first query such that indexes will be used?

i do not know your db design but it looks queer to me to have a big_table with
two columns y1 and y2 which seems to have the same meaning (some value which
is compared to another value of little_table).

why dont you put just one column "y" in your big_table?

kind regards,
janning

> Mike Mascari

pgsql-general by date:

Previous
From: "Katsaros Kwn/nos"
Date:
Subject: Re: Networking feature for postgresql...
Next
From: John DeSoi
Date:
Subject: Re: OS X Install