Thread: Allowing x IS NOT NULL as a btree search condition

Allowing x IS NOT NULL as a btree search condition

From
Tom Lane
Date:
Back when we put in the ability to use "x IS NULL" as a btree search
condition, we intentionally left out "x IS NOT NULL", on the grounds
that it is comparable to "x <> something" which is not btree-searchable
either.  However, it occurs to me that we missed a bet here.  The NOT
NULL condition could instead be treated like "x is less than NULL"
(in a NULLS LAST index) or "x is greater than NULL" (in a NULLS FIRST
index), which would make it work like a searchable inequality.  It's
still true that except in the case of a mostly-null column, it would
seldom be worth doing such an indexscan.  However, I can see an
application where an index search condition like this would be highly
worthwhile: namely, trying to extract a column min or max.  Right now,
if there are a fair number of nulls at the end of the index you're
interested in, you have to stupidly scan through them --- but if the
btree code knew about doing this, it could descend the tree
intelligently and land right on the first or last non-null.

We have already seen field complaints about the performance of
index-optimized MAX in cases with many nulls, eg
http://archives.postgresql.org/pgsql-performance/2006-12/msg00099.php
which fixing this would take care of.  This would also affect the
usefulness of the idea I proposed earlier today about automatically
updating the histogram bin boundaries when trying to estimate inequality
selectivity for comparison values near the ends of the range --- if we
can't rely on the index lookup for max or min to be cheap, doing that
stops looking quite so attractive.

While I haven't tried to code this yet, I'm guessing that it's just a
very small addition to the logic we already have.  Any objections to
fixing this up?
        regards, tom lane


Re: Allowing x IS NOT NULL as a btree search condition

From
Bruce Momjian
Date:
Tom Lane wrote:
> Back when we put in the ability to use "x IS NULL" as a btree search
> condition, we intentionally left out "x IS NOT NULL", on the grounds
> that it is comparable to "x <> something" which is not btree-searchable
> either.  However, it occurs to me that we missed a bet here.  The NOT
> NULL condition could instead be treated like "x is less than NULL"
> (in a NULLS LAST index) or "x is greater than NULL" (in a NULLS FIRST
> index), which would make it work like a searchable inequality.  It's
> still true that except in the case of a mostly-null column, it would
> seldom be worth doing such an indexscan.  However, I can see an
> application where an index search condition like this would be highly
> worthwhile: namely, trying to extract a column min or max.  Right now,
> if there are a fair number of nulls at the end of the index you're
> interested in, you have to stupidly scan through them --- but if the
> btree code knew about doing this, it could descend the tree
> intelligently and land right on the first or last non-null.
> 
> We have already seen field complaints about the performance of
> index-optimized MAX in cases with many nulls, eg
> http://archives.postgresql.org/pgsql-performance/2006-12/msg00099.php
> which fixing this would take care of.  This would also affect the
> usefulness of the idea I proposed earlier today about automatically
> updating the histogram bin boundaries when trying to estimate inequality
> selectivity for comparison values near the ends of the range --- if we
> can't rely on the index lookup for max or min to be cheap, doing that
> stops looking quite so attractive.
> 
> While I haven't tried to code this yet, I'm guessing that it's just a
> very small addition to the logic we already have.  Any objections to
> fixing this up?

Sounds good to me.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +