Thread: Why is NULL not indexable?
I'm kinda of curious because if you look at src/backend/access/nbtree/nbtree.c there are comments stating that you can add NULLs to the index but mentions that you can't because: "that's an artifact of the strategy map architecture chosen in 1986." I can't work out what the 'strategy' bit refers to. All I can find in the source code is references to tables of magic numbers. I guess what I really want to know is, how hard would it be to fix? (BTW, it seems there's an awful lot of duplicated code between all the index types. Is this a historical accident or is there a reason?) -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ - Artificial Intelligence is the science of making computers that behave - like the ones in the movies.
Martijn van Oosterhout <kleptog@svana.org> writes: > I can't work out what the 'strategy' bit refers to. All I can find in the > source code is references to tables of magic numbers. I guess what I really > want to know is, how hard would it be to fix? I believe the main problem is that IS NULL and IS NOT NULL are not operators (they don't have pg_operator entries), and all of the planning and indexscan execution machinery is designed around operators. Binary operators, at that. It's possible that this could be hacked around by creating dummy pg_operator entries for them, but my bet is that cleaning up the loose ends and no-longer-valid coding assumptions would be a nontrivial task. regards, tom lane
I was thinking about what this actually meant and came to the conclusion that having SELECT * FROM foo WHERE bar IS NULL would always result in a sequential scan. Or does it mean anything else? Daniel Åkerud > > I can't work out what the 'strategy' bit refers to. All I can find in the > > source code is references to tables of magic numbers. I guess what I really > > want to know is, how hard would it be to fix? > > I believe the main problem is that IS NULL and IS NOT NULL are not > operators (they don't have pg_operator entries), and all of the planning > and indexscan execution machinery is designed around operators. Binary > operators, at that. > > It's possible that this could be hacked around by creating dummy > pg_operator entries for them, but my bet is that cleaning up the loose > ends and no-longer-valid coding assumptions would be a nontrivial task. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
On Tue, Jun 26, 2001 at 11:02:41AM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I can't work out what the 'strategy' bit refers to. All I can find in the > > source code is references to tables of magic numbers. I guess what I really > > want to know is, how hard would it be to fix? > > I believe the main problem is that IS NULL and IS NOT NULL are not > operators (they don't have pg_operator entries), and all of the planning > and indexscan execution machinery is designed around operators. Binary > operators, at that. Ok, I can see at least part of the problem now. You could make two operators 'is' and 'isnot' and have them equivalent to '=' and '!=' except in the case of nulls. The index code is doing something like this anyway already. > It's possible that this could be hacked around by creating dummy > pg_operator entries for them, but my bet is that cleaning up the loose > ends and no-longer-valid coding assumptions would be a nontrivial task. Is there any documentation describing how this all works? I can't find anything in the source code nor does anything appear in the mailing list archives (maybe I'm searching for the wrong terms). All that stuff relating to strategies (whatever they are) seems like it could do with some #defines indicating what the numbers mean. As a side note, the partial index stuff seems to be still perfectly fine in the indexing code, you say it's the planner that doesn't handle it right? Thanks for your time, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ - Artificial Intelligence is the science of making computers that behave - like the ones in the movies.
On Tue, Jun 26, 2001 at 06:40:57PM +0200, Daniel ?kerud wrote: > I was thinking about what this actually meant and came to the conclusion > that having > SELECT * FROM foo WHERE bar IS NULL > would always result in a sequential scan. > > Or does it mean anything else? No, that's exactly what it means. That's why I'm looking at it because it's something I would like to change, but I havn't quite worked out how. The thing is, if I converted all the nulls to empty strings they would be indexable but the statistics would be terribly skewed giving a seqential scan anyway. I think I need to rethink this anyway... > > > I can't work out what the 'strategy' bit refers to. All I can find in > the > > > source code is references to tables of magic numbers. I guess what I > really > > > want to know is, how hard would it be to fix? > > > > I believe the main problem is that IS NULL and IS NOT NULL are not > > operators (they don't have pg_operator entries), and all of the planning > > and indexscan execution machinery is designed around operators. Binary > > operators, at that. > > > > It's possible that this could be hacked around by creating dummy > > pg_operator entries for them, but my bet is that cleaning up the loose > > ends and no-longer-valid coding assumptions would be a nontrivial task. > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ - Artificial Intelligence is the science of making computers that behave - like the ones in the movies.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Tue, Jun 26, 2001 at 11:02:41AM -0400, Tom Lane wrote: >> I believe the main problem is that IS NULL and IS NOT NULL are not >> operators (they don't have pg_operator entries), and all of the planning >> and indexscan execution machinery is designed around operators. Binary >> operators, at that. > Ok, I can see at least part of the problem now. You could make two operators > 'is' and 'isnot' and have them equivalent to '=' and '!=' except in the case > of nulls. The index code is doing something like this anyway already. Btree does that, but it might not work at all for non-btree indexes. Besides, you'd need a pair of such operators for every indexable datatype. I'd prefer to see the notion of IS (NOT) NULL directly expressed in some fashion in the ScanKey representation. Phony operators corresponding to the (existing) functions nullvalue and nonnullvalue might work. > Is there any documentation describing how this all works? http://www.ca.postgresql.org/users-lounge/docs/7.1/postgres/xindex.html > All that stuff relating > to strategies (whatever they are) seems like it could do with some #defines > indicating what the numbers mean. The generic strategy stuff doesn't *know* what the numbers mean, since they are index access method specific. See, eg, for btree src/include/access/nbtree.h: /* * Operator strategy numbers -- ordering of these is <, <=, =, >=, > */ #define BTLessStrategyNumber 1 #define BTLessEqualStrategyNumber 2 #define BTEqualStrategyNumber 3 #define BTGreaterEqualStrategyNumber 4 #define BTGreaterStrategyNumber 5 #define BTMaxStrategyNumber 5 > As a side note, the partial index stuff seems to be still perfectly fine in > the indexing code, you say it's the planner that doesn't handle it right? The planner code is still there. Someone once ripped out the WHERE clause from CREATE INDEX in the grammar, for reasons undocumented and now forgotten. It's hard to guess what might need to be fixed after adding that back --- surely all that code is now suffering bit-rot to some extent. regards, tom lane
On Wed, Jun 27, 2001 at 10:04:23AM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Ok, I can see at least part of the problem now. You could make two operators > > 'is' and 'isnot' and have them equivalent to '=' and '!=' except in the case > > of nulls. The index code is doing something like this anyway already. > > Btree does that, but it might not work at all for non-btree indexes. > Besides, you'd need a pair of such operators for every indexable > datatype. I'd prefer to see the notion of IS (NOT) NULL directly > expressed in some fashion in the ScanKey representation. Phony > operators corresponding to the (existing) functions nullvalue and > nonnullvalue might work. So the options are to either find a way to make the strategy code handle unary operators, or to make binary operators is and isnot that the parser uses when it sees an IS NULL expression. You'd want to add two more strategies to represent the relationsationships. This is not going to be quick, that's for sure. BTW, is there a way of dumping the contents of an index, for example the way an index scan would see it? This would mean you could see whether the index was storing the data correctly in the first place. > > Is there any documentation describing how this all works? > http://www.ca.postgresql.org/users-lounge/docs/7.1/postgres/xindex.html Thanks, that made it much clearer. That page didn't show up on google, which is why I didn't find it. So index types define their strategies and the *_ops define the relationships between the strategies and the operators. > > As a side note, the partial index stuff seems to be still perfectly fine in > > the indexing code, you say it's the planner that doesn't handle it right? > > The planner code is still there. Someone once ripped out the WHERE > clause from CREATE INDEX in the grammar, for reasons undocumented and > now forgotten. It's hard to guess what might need to be fixed after > adding that back --- surely all that code is now suffering bit-rot to > some extent. So I guess the first step is to add that back and see what breaks? Sounds like fun :) -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ - Artificial Intelligence is the science of making computers that behave - like the ones in the movies.
Martijn van Oosterhout <kleptog@svana.org> writes: > You'd want to add two more strategies to represent the relationsationships. > This is not going to be quick, that's for sure. Yeah, it would be really tedious to do it that way, because pg_amop entries would need to be added for every indexable datatype. This wouldn't bother me so much for built-in datatypes, but it would also break user-defined types that have index support --- indexing would fail until they added entries too. Since there isn't any real need for datatype-specific handling of NULL searches, I'd be inclined to special-case them somehow without adding explicit strategy numbers for them. Not sure what it would take to do this, though. regards, tom lane