Thread: Why is NULL not indexable?

Why is NULL not indexable?

From
Martijn van Oosterhout
Date:
I'm kinda of curious because if you look at
src/backend/access/nbtree/nbtree.c there are comments stating that you can
add NULLs to the index but mentions that you can't because:

"that's an artifact of the strategy map architecture chosen in 1986."

I can't work out what the 'strategy' bit refers to. All I can find in the
source code is references to tables of magic numbers. I guess what I really
want to know is, how hard would it be to fix?

(BTW, it seems there's an awful lot of duplicated code between all the index
types. Is this a historical accident or is there a reason?)

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
- Artificial Intelligence is the science of making computers that behave
- like the ones in the movies.

Re: Why is NULL not indexable?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I can't work out what the 'strategy' bit refers to. All I can find in the
> source code is references to tables of magic numbers. I guess what I really
> want to know is, how hard would it be to fix?

I believe the main problem is that IS NULL and IS NOT NULL are not
operators (they don't have pg_operator entries), and all of the planning
and indexscan execution machinery is designed around operators.  Binary
operators, at that.

It's possible that this could be hacked around by creating dummy
pg_operator entries for them, but my bet is that cleaning up the loose
ends and no-longer-valid coding assumptions would be a nontrivial task.

            regards, tom lane

Re: Why is NULL not indexable?

From
Daniel Åkerud
Date:
I was thinking about what this actually meant and came to the conclusion
that having
SELECT * FROM foo WHERE bar IS NULL
would always result in a sequential scan.

Or does it mean anything else?

Daniel Åkerud

> > I can't work out what the 'strategy' bit refers to. All I can find in
the
> > source code is references to tables of magic numbers. I guess what I
really
> > want to know is, how hard would it be to fix?
>
> I believe the main problem is that IS NULL and IS NOT NULL are not
> operators (they don't have pg_operator entries), and all of the planning
> and indexscan execution machinery is designed around operators.  Binary
> operators, at that.
>
> It's possible that this could be hacked around by creating dummy
> pg_operator entries for them, but my bet is that cleaning up the loose
> ends and no-longer-valid coding assumptions would be a nontrivial task.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


Re: Why is NULL not indexable?

From
Martijn van Oosterhout
Date:
On Tue, Jun 26, 2001 at 11:02:41AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I can't work out what the 'strategy' bit refers to. All I can find in the
> > source code is references to tables of magic numbers. I guess what I really
> > want to know is, how hard would it be to fix?
>
> I believe the main problem is that IS NULL and IS NOT NULL are not
> operators (they don't have pg_operator entries), and all of the planning
> and indexscan execution machinery is designed around operators.  Binary
> operators, at that.

Ok, I can see at least part of the problem now. You could make two operators
'is' and 'isnot' and have them equivalent to '=' and '!=' except in the case
of nulls. The index code is doing something like this anyway already.

> It's possible that this could be hacked around by creating dummy
> pg_operator entries for them, but my bet is that cleaning up the loose
> ends and no-longer-valid coding assumptions would be a nontrivial task.

Is there any documentation describing how this all works? I can't find
anything in the source code nor does anything appear in the mailing list
archives (maybe I'm searching for the wrong terms). All that stuff relating
to strategies (whatever they are) seems like it could do with some #defines
indicating what the numbers mean.

As a side note, the partial index stuff seems to be still perfectly fine in
the indexing code, you say it's the planner that doesn't handle it right?

Thanks for your time,
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
- Artificial Intelligence is the science of making computers that behave
- like the ones in the movies.

Re: Why is NULL not indexable?

From
Martijn van Oosterhout
Date:
On Tue, Jun 26, 2001 at 06:40:57PM +0200, Daniel ?kerud wrote:
> I was thinking about what this actually meant and came to the conclusion
> that having
> SELECT * FROM foo WHERE bar IS NULL
> would always result in a sequential scan.
>
> Or does it mean anything else?

No, that's exactly what it means. That's why I'm looking at it because it's
something I would like to change, but I havn't quite worked out how.

The thing is, if I converted all the nulls to empty strings they would be
indexable but the statistics would be terribly skewed giving a seqential
scan anyway.

I think I need to rethink this anyway...

> > > I can't work out what the 'strategy' bit refers to. All I can find in
> the
> > > source code is references to tables of magic numbers. I guess what I
> really
> > > want to know is, how hard would it be to fix?
> >
> > I believe the main problem is that IS NULL and IS NOT NULL are not
> > operators (they don't have pg_operator entries), and all of the planning
> > and indexscan execution machinery is designed around operators.  Binary
> > operators, at that.
> >
> > It's possible that this could be hacked around by creating dummy
> > pg_operator entries for them, but my bet is that cleaning up the loose
> > ends and no-longer-valid coding assumptions would be a nontrivial task.
> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
- Artificial Intelligence is the science of making computers that behave
- like the ones in the movies.

Re: Why is NULL not indexable?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Tue, Jun 26, 2001 at 11:02:41AM -0400, Tom Lane wrote:
>> I believe the main problem is that IS NULL and IS NOT NULL are not
>> operators (they don't have pg_operator entries), and all of the planning
>> and indexscan execution machinery is designed around operators.  Binary
>> operators, at that.

> Ok, I can see at least part of the problem now. You could make two operators
> 'is' and 'isnot' and have them equivalent to '=' and '!=' except in the case
> of nulls. The index code is doing something like this anyway already.

Btree does that, but it might not work at all for non-btree indexes.
Besides, you'd need a pair of such operators for every indexable
datatype.  I'd prefer to see the notion of IS (NOT) NULL directly
expressed in some fashion in the ScanKey representation.  Phony
operators corresponding to the (existing) functions nullvalue and
nonnullvalue might work.

> Is there any documentation describing how this all works?

http://www.ca.postgresql.org/users-lounge/docs/7.1/postgres/xindex.html

> All that stuff relating
> to strategies (whatever they are) seems like it could do with some #defines
> indicating what the numbers mean.

The generic strategy stuff doesn't *know* what the numbers mean, since
they are index access method specific.  See, eg, for btree
src/include/access/nbtree.h:

/*
 *    Operator strategy numbers -- ordering of these is <, <=, =, >=, >
 */

#define BTLessStrategyNumber        1
#define BTLessEqualStrategyNumber    2
#define BTEqualStrategyNumber        3
#define BTGreaterEqualStrategyNumber    4
#define BTGreaterStrategyNumber        5
#define BTMaxStrategyNumber        5

> As a side note, the partial index stuff seems to be still perfectly fine in
> the indexing code, you say it's the planner that doesn't handle it right?

The planner code is still there.  Someone once ripped out the WHERE
clause from CREATE INDEX in the grammar, for reasons undocumented and
now forgotten.  It's hard to guess what might need to be fixed after
adding that back --- surely all that code is now suffering bit-rot to
some extent.

            regards, tom lane

Re: Why is NULL not indexable?

From
Martijn van Oosterhout
Date:
On Wed, Jun 27, 2001 at 10:04:23AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Ok, I can see at least part of the problem now. You could make two operators
> > 'is' and 'isnot' and have them equivalent to '=' and '!=' except in the case
> > of nulls. The index code is doing something like this anyway already.
>
> Btree does that, but it might not work at all for non-btree indexes.
> Besides, you'd need a pair of such operators for every indexable
> datatype.  I'd prefer to see the notion of IS (NOT) NULL directly
> expressed in some fashion in the ScanKey representation.  Phony
> operators corresponding to the (existing) functions nullvalue and
> nonnullvalue might work.

So the options are to either find a way to make the strategy code handle
unary operators, or to make binary operators is and isnot that the parser
uses when it sees an IS NULL expression.

You'd want to add two more strategies to represent the relationsationships.
This is not going to be quick, that's for sure. BTW, is there a way of
dumping the contents of an index, for example the way an index scan would
see it? This would mean you could see whether the index was storing the data
correctly in the first place.

> > Is there any documentation describing how this all works?
> http://www.ca.postgresql.org/users-lounge/docs/7.1/postgres/xindex.html

Thanks, that made it much clearer. That page didn't show up on google, which
is why I didn't find it. So index types define their strategies and the
*_ops define the relationships between the strategies and the operators.

> > As a side note, the partial index stuff seems to be still perfectly fine in
> > the indexing code, you say it's the planner that doesn't handle it right?
>
> The planner code is still there.  Someone once ripped out the WHERE
> clause from CREATE INDEX in the grammar, for reasons undocumented and
> now forgotten.  It's hard to guess what might need to be fixed after
> adding that back --- surely all that code is now suffering bit-rot to
> some extent.

So I guess the first step is to add that back and see what breaks? Sounds
like fun :)

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
- Artificial Intelligence is the science of making computers that behave
- like the ones in the movies.

Re: Why is NULL not indexable?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> You'd want to add two more strategies to represent the relationsationships.
> This is not going to be quick, that's for sure.

Yeah, it would be really tedious to do it that way, because pg_amop
entries would need to be added for every indexable datatype.  This
wouldn't bother me so much for built-in datatypes, but it would also
break user-defined types that have index support --- indexing would
fail until they added entries too.

Since there isn't any real need for datatype-specific handling of NULL
searches, I'd be inclined to special-case them somehow without adding
explicit strategy numbers for them.  Not sure what it would take to
do this, though.

            regards, tom lane