Re: IN vs EXISTS equivalence - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: IN vs EXISTS equivalence
Date
Msg-id 1218736209.5343.537.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: IN vs EXISTS equivalence  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: IN vs EXISTS equivalence  (daveg <daveg@sonic.net>)
List pgsql-hackers
On Fri, 2008-08-08 at 16:23 -0400, Tom Lane wrote:

> NOT IN is a lot trickier,
> for the same reason that typically trips up novices who try to use it:
> if any row of the subselect produces a NULL comparison result, then it
> is impossible for the NOT IN to result in TRUE, which means that it
> does not function as a standard antijoin.  I thought about optimizing
> it only in the case where we can prove that the subselect outputs and
> the compared-to values are known NOT NULL (which in typical cases we
> could prove by looking for NOT NULL constraints on those table
> columns).  The trouble with this is that that's not a sufficient
> condition: you must also assume that the comparison operator involved
> never yields NULL for non-null inputs.  That might be okay for btree
> comparison functions but it's not a very comfy assumption in general;
> we certainly haven't got any explicit knowledge that any functions are
> guaranteed to act that way.  So this case might be worth doing later
> but I'm not feeling excited about it. We generally tell people to
> avoid NOT IN and I'm happy to keep on saying that.

Just found this comment, after reading what you said on other thread
about NOT IN.

NOT IN is a serious performance issue for most people. We simply can't
say to people "you were told not to".

If we can fix it easily for the majority of cases, we should. We can't
let the "it won't work in certain cases" reason prevent various
optimizations from going in. There are tons of places where we say "XXX
needs later improvement" in code comments. So lets do that here also. It
certainly wouldn't be the first optimization/feature that went into code
in a restricted way that didn't work for all cases: hash joins, ANALYZE,
partial indexes etc..

Anybody that is writing complex SQL with user defined operators knows
enough to re-write their queries correctly, so there will be almost no
negative effect from making the NOT IN optimisation a special case. And
if there is an effect, the people effected can fix the problem.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: WIP: patch to create explicit support for semi and anti joins
Next
From: "Kevin Grittner"
Date:
Subject: Re: WIP: patch to create explicit support for semi and anti joins