Thread: Index Usage using IN

Index Usage using IN

From

Ralph Mason

Date:

01 February 2006, 19:18:14

Hi,

I have 2 tables both have an index on ID (both ID columns are an oid).

I want to find only only rows in one and not the other.

Select ID from TableA where ID not IN ( Select ID from Table B)

This always generates sequential scans.

Table A has about 250,000 rows.   Table B has about 250,000 Rows.

We should get a Scan on Table B and a Index Lookup on Table A.

Is there any way to force this?  enable_seqscan off doesn't help at all.

The Plan is

Seq Scan on tablea(cost=100000000.00..23883423070450.96 rows=119414 width=4)
  Filter: (NOT (subplan))"
  SubPlan    ->
     Seq Scan on tableb (cost=100000000.00..100004611.17 rows=242617
width=4)


Thanks
Ralph

Re: Index Usage using IN

From

"Jeffrey W. Baker"

Date:

01 February 2006, 19:22:55

On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
> Hi,
>
> I have 2 tables both have an index on ID (both ID columns are an oid).
>
> I want to find only only rows in one and not the other.
>
> Select ID from TableA where ID not IN ( Select ID from Table B)

Have you considered this:

SELECT ID from TableA EXCEPT Select ID from Table B

?

-jwb

Re: Index Usage using IN

From

"Jeffrey W. Baker"

Date:

01 February 2006, 19:28:22

On Wed, 2006-02-01 at 12:22 -0800, Jeffrey W. Baker wrote:
> On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
> > Hi,
> >
> > I have 2 tables both have an index on ID (both ID columns are an oid).
> >
> > I want to find only only rows in one and not the other.
> >
> > Select ID from TableA where ID not IN ( Select ID from Table B)
>
> Have you considered this:
>
> SELECT ID from TableA EXCEPT Select ID from Table B

Alternately:

   SELECT a.ID
     FROM TableA AS a
LEFT JOIN TableB AS b
       ON a.ID = b.ID
    WHERE b.ID IS NULL

-jwb

Re: Index Usage using IN

From

Tom Lane

Date:

01 February 2006, 19:41:11

"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
>> Select ID from TableA where ID not IN ( Select ID from Table B)

> Have you considered this:

> SELECT ID from TableA EXCEPT Select ID from Table B

Also, increasing work_mem might persuade the planner to try a hashed
subplan, which'd be a lot better than what you have.  Note that it's
quite unlikely that indexes are going to help for this.

            regards, tom lane

Re: Index Usage using IN

From

Bruno Wolff III

Date:

01 February 2006, 20:15:16

On Thu, Feb 02, 2006 at 09:12:59 +1300,
  Ralph Mason <ralph.mason@telogis.com> wrote:
> Hi,
>
> I have 2 tables both have an index on ID (both ID columns are an oid).
>
> I want to find only only rows in one and not the other.
>
> Select ID from TableA where ID not IN ( Select ID from Table B)
>
> This always generates sequential scans.
>
> Table A has about 250,000 rows.   Table B has about 250,000 Rows.
>
> We should get a Scan on Table B and a Index Lookup on Table A.

I don't think that is going to work if there are NULLs in table B.
I don't know whether or not Postgres has code to special case NULL testing
(either for constraints ruling them out, or doing probes for them in addition
to the key it is trying to match) for doing NOT IN. Just doing a simple
index probe into table A isn't going to tell you all you need to know if
you don't find a match.

Re: Index Usage using IN

From

Hari Warrier

Date:

01 February 2006, 21:13:01

Select ID from TableA where not exists ( Select ID from Table B where ID
= TableA.ID)
might give you index scan. Of course, that is only useful is TableA is
very small table.
Not appropriate for 250k rows

on 2/1/2006 12:12 PM Ralph Mason said the following:
> Hi,
>
> I have 2 tables both have an index on ID (both ID columns are an oid).
>
> I want to find only only rows in one and not the other.
>
> Select ID from TableA where ID not IN ( Select ID from Table B)
>
> This always generates sequential scans.
>
> Table A has about 250,000 rows.   Table B has about 250,000 Rows.
>
> We should get a Scan on Table B and a Index Lookup on Table A.
>
> Is there any way to force this?  enable_seqscan off doesn't help at all.
>
> The Plan is
>
> Seq Scan on tablea(cost=100000000.00..23883423070450.96 rows=119414
> width=4)
>  Filter: (NOT (subplan))"
>  SubPlan    ->     Seq Scan on tableb (cost=100000000.00..100004611.17
> rows=242617 width=4)
>
>
> Thanks
> Ralph
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>