Thread: EXCEPT performace

EXCEPT performace

From
"K. Ari Krupnikov"
Date:
I noticed an unexpected difference in performance between seemingly equivalent queries:

SELECT id FROM short_table EXCEPT SELECT id FROM long_table;

and

SELECT id FROM short_table EXCEPT
    (SELECT id FROM short_table JOIN long_table
     ON short_table.id = long_table.id);

There is an index on longtable.id; short_table is a temporary table. EXPLAIN
suggests that the index is ignored in the first case but consulted in the second.

Seq Scan on short_table  (cost=0.00..2857177.50 rows=1000 width=12)
  SubPlan
    ->  Seq Scan on long_table  (cost=0.00..5714.31 rows=200831 width=12)

vs.

Seq Scan on short_table  (cost=0.00..44.52 rows=3 width=12)
  SubPlan
    ->  Materialize  (cost=14.49..14.49 rows=3 width=24)
          ->  Nested Loop  (cost=0.00..14.49 rows=3 width=24)
                ->  Seq Scan on short_table  (cost=0.00..1.03 rows=3 width=12)
                ->  Index Scan using long_table_id_ix on long_table  (cost=0.00..4.48 rows=1 width=12)

Is there a reason an index cannot be used in an EXCEPT query?

Ari.

Re: EXCEPT performace

From
Stephan Szabo
Date:
On Sat, 3 Nov 2001, K. Ari Krupnikov wrote:

> I noticed an unexpected difference in performance between seemingly equivalent queries:
>
> SELECT id FROM short_table EXCEPT SELECT id FROM long_table;
>
> and
>
> SELECT id FROM short_table EXCEPT
>     (SELECT id FROM short_table JOIN long_table
>      ON short_table.id = long_table.id);
>
> There is an index on longtable.id; short_table is a temporary table. EXPLAIN
> suggests that the index is ignored in the first case but consulted in the second.

A complete guess, but EXCEPT may be being rewritten into an IN<subquery>
style query, which would mean it'd have the same performance
issues as that form (see the faq).

If your queries are simple, it may be best to rewrite them as something
like:
  select id from short_table where not exists (select id from long_table
   where short_table.id=long_table.id);

There may be some null related issues because I don't remember how except
handles nulls, but that will probably run better.


Re: EXCEPT performace

From
Tom Lane
Date:
"K. Ari Krupnikov" <ari@cogsci.ed.ac.uk> writes:
> I noticed an unexpected difference in performance between seemingly equivalent queries:
> SELECT id FROM short_table EXCEPT SELECT id FROM long_table;

> and

> SELECT id FROM short_table EXCEPT
>     (SELECT id FROM short_table JOIN long_table
>      ON short_table.id = long_table.id);

But they're not equivalent.  Consider what happens when NULLs are
present in both tables.

            regards, tom lane