Optimisation of INTERSECT expressions - Mailing list pgsql-performance

From Phil Endecott
Subject Optimisation of INTERSECT expressions
Date
Msg-id 20040323141200.34D86D1EC02@svr1.postgresql.org
Whole thread Raw
Responses Re: Optimisation of INTERSECT expressions
List pgsql-performance
Dear PostgresQL Experts,

I am trying to get to the bottom of some efficiency problems and hope that
you can help.  The difficulty seems to be with INTERSECT expressions.

I have a query of the form
     select A from T where C1 intersect select A from T where C2;
It runs in about 100 ms.

But it is equivalent to this query
     select A from T where C1 and C2;
which runs in less than 10 ms.

Looking at the output of "explain analyse" on the first query, it seems
that PostgresQL always computes the two sub-expressions and then computes
an explicit intersection on the results.  I had hoped that it would notice
that both subexpressions are scanning the same input table T and convert
the expression to the second form.

Is there a reason why it can't do this transformation?

(Of course, I could just re-write my code to use the second form, but my
application is generating these bits of SQL programmatically, and it is not
trivial as in some cases the two tables are not the same so an intersection
really is needed; if PostgresQL can do it for me, that would be much
better.  I don't want to write an SQL parser!)


While studying the same code I found another case where my INTERSECT
expressions don't seem to be optimised as much as I'd like.  In this case,
one of the subexpressions being intersected is empty much of the time.  But
even when it is empty, PostgresQL computes the other (expensive)
subexpression and does an intersect.  Could PostgresQL do something like this:

- guess which subexpression is likely to produce fewest rows
- compute this subexpression
- if empty, return now with an empty result
- compute other subexpression
- compute intersection
- return intersection

Alternatively, it could be defined that the left subexpression is always
computed first and the second not computed if it is empty, like the
behaviour of logical AND and OR operators in C.

Thanks in advance for any suggestions.

--Phil.


pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Benchmarking postgres on Solaris/Linux
Next
From: Stephan Szabo
Date:
Subject: Re: Optimisation of INTERSECT expressions