Thread: handling NULLS in GiST
Tom, we noticed you changed gist.c to handle NULLS. It seems there is problem with your changes. in gist.c /* GIST indexes don't index nulls, see notes in gistinsert */ if (! IndexTupleHasNulls(itup)) { /* ....... skipped .... /* * Currently, GIST indexes do not support indexing NULLs; considerable * infrastructure work wouldhave to be done to do anything reasonable * with a NULL. */ if (IndexTupleHasNulls(itup)) { While it's ok for single key but for multikey indexes removing tuple with NULL looks not right. Consider (a,b,c) where C is NULL. Your changes would remove tuple and it would be impossible to find (a,b) using this index. Did you think about this particular case ? I remind we have choosen to leave NULLs because vacuum complained about different number of tuples in heap and index and all our opclasses work correctly with NULLs. Did you change vacuum code so it will not complain ? In principle, if you insist on your approach, we propose to extend it to multikey case by removing tuple if and only if leading keys are NULLs What do you think ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > we noticed you changed gist.c to handle NULLS. It seems there is > problem with your changes. I would like to see GIST upgraded to handle nulls, but at the moment it's not null-safe. Try a few null entries, watch it core dump, if you don't have that patch in place. (At least it does with the contrib/cube opclass, didn't bother with any additional experiments.) At the very least you'd need to replace all the uses of DirectFunctionCallN to invoke the opclass support routines with code that is capable of detecting and signaling nulls. That would allow non-null-safe opclass routines to be protected by marking them "strict". But that's a micro-issue. The macro-issue is what you intend to do with NULLs in the first place. I understand what btree does with them, but what's the corresponding concept for GIST? > I remind we have choosen to leave NULLs because vacuum complained about > different number of tuples in heap and index and all our opclasses work > correctly with NULLs. Did you change vacuum code so it will not complain ? Yes. regards, tom lane
On Mon, 16 Jul 2001, Tom Lane wrote: > Oleg Bartunov <oleg@sai.msu.su> writes: > > we noticed you changed gist.c to handle NULLS. It seems there is > > problem with your changes. > > I would like to see GIST upgraded to handle nulls, but at the moment > it's not null-safe. Try a few null entries, watch it core dump, if you > don't have that patch in place. (At least it does with the contrib/cube > opclass, didn't bother with any additional experiments.) We also would like to handle NULLs. All our codes handle NULLs properly. contrib/cube is just a bad example :-) In any case if you give an interface to developer it's his responsibility to be aware of possible errors. Developer has always a possibility to divide by zero. We could change contrib/cube to be null-safe. Also multikey split algorithm uses NULL to mark secondary (...) keys in tuple for optimization of page splitting and we don't like idea to rewrite algorithm. GiST interface functions (split, union - user-level functions) have a pointer to operand vector as argument. Operand vector can't be a NULL, but some operands in the vector could be NULL. > > At the very least you'd need to replace all the uses of > DirectFunctionCallN to invoke the opclass support routines > with code that is capable of detecting and signaling nulls. > That would allow non-null-safe opclass routines to be protected > by marking them "strict". vaguely understand :-) DirectFunctionCallN are already interface to opclass support routines. Do we need to build on yet another interface just to mark bad users routines ? What should we do with that 'strict' mark ? > > But that's a micro-issue. agreed, but I'd like to require people write null-safe contribs and remove your stopper. The macro-issue is what you intend to > do with NULLs in the first place. I understand what btree does > with them, but what's the corresponding concept for GIST? if you mean first NULL keys in multikey GiST than just remove this tuple from index because it's informativeless. btw, what btree does ? > regards, tom lane > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > contrib/cube is just a bad example :-) In any case if you give an > interface to developer it's his responsibility to be aware of possible > errors. Developer has always a possibility to divide by zero. > We could change contrib/cube to be null-safe. My point is that as it stands, GIST is not honoring the defined interface for nulls. AFAICT you are relying on the called opclass routines to test for null pointers, which is not clean. (Among other things, it means that you cannot work with pass-by-value datatypes.) There has to be a separate isNull flag for each value. contrib/cube very possibly is broken, but that doesn't mean that the core GIST code isn't at fault too. > DirectFunctionCallN are already interface to > opclass support routines. But the FunctionCallN routines do not allow passing or returning NULL. That was a deliberate choice to preserve notational simplicity, because most of the places where they needed to be used didn't have to worry about NULLs. You do, so you can't use those routines. >> The macro-issue is what you intend to >> do with NULLs in the first place. I understand what btree does >> with them, but what's the corresponding concept for GIST? > if you mean first NULL keys in multikey GiST than just remove this tuple > from index because it's informativeless. btw, what btree does ? If you remove the tuple from the index then you're not storing NULLs. You need to pick a rule that defines where null rows will get placed in the index. For btree, the rule is "nulls sort after all non-nulls". regards, tom lane