Thread: handling NULLS in GiST

handling NULLS in GiST

From
Oleg Bartunov
Date:
Tom,

we noticed you changed gist.c to handle NULLS. It seems there is
problem with your changes.
in gist.c
       /* GIST indexes don't index nulls, see notes in gistinsert */       if (! IndexTupleHasNulls(itup))       {
        /*
 

....... skipped ....
       /*        * Currently, GIST indexes do not support indexing NULLs; considerable        * infrastructure work
wouldhave to be done to do anything reasonable        * with a NULL.        */       if (IndexTupleHasNulls(itup))
{
 


While it's ok for single key but for multikey indexes removing tuple with NULL
looks not right. Consider (a,b,c) where C is NULL. Your changes would
remove tuple and it would be impossible to find (a,b) using this index.
Did you think  about this particular case ?

I remind we have choosen to leave NULLs because vacuum complained about
different number of tuples in heap and index and all our opclasses work
correctly with NULLs. Did you change vacuum code so it will not complain ?

In principle, if you insist on your approach, we propose to extend it
to multikey case by removing tuple if and only if leading keys are NULLs

What do you think ?
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: handling NULLS in GiST

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> we noticed you changed gist.c to handle NULLS. It seems there is
> problem with your changes.

I would like to see GIST upgraded to handle nulls, but at the moment
it's not null-safe.  Try a few null entries, watch it core dump, if you
don't have that patch in place.  (At least it does with the contrib/cube
opclass, didn't bother with any additional experiments.)

At the very least you'd need to replace all the uses of
DirectFunctionCallN to invoke the opclass support routines
with code that is capable of detecting and signaling nulls.
That would allow non-null-safe opclass routines to be protected
by marking them "strict".

But that's a micro-issue.  The macro-issue is what you intend to
do with NULLs in the first place.  I understand what btree does
with them, but what's the corresponding concept for GIST?

> I remind we have choosen to leave NULLs because vacuum complained about
> different number of tuples in heap and index and all our opclasses work
> correctly with NULLs. Did you change vacuum code so it will not complain ?

Yes.
        regards, tom lane


Re: handling NULLS in GiST

From
Oleg Bartunov
Date:
On Mon, 16 Jul 2001, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
> > we noticed you changed gist.c to handle NULLS. It seems there is
> > problem with your changes.
>
> I would like to see GIST upgraded to handle nulls, but at the moment
> it's not null-safe.  Try a few null entries, watch it core dump, if you
> don't have that patch in place.  (At least it does with the contrib/cube
> opclass, didn't bother with any additional experiments.)

We also would like to handle NULLs. All our codes handle NULLs properly.
contrib/cube is just a bad example :-) In any case if you give an
interface to developer it's his responsibility to be aware of possible
errors. Developer has always a possibility to divide by zero.
We could change contrib/cube to be null-safe.
Also multikey split algorithm uses NULL to mark secondary (...) keys
in tuple for optimization of page splitting and we don't like idea to
rewrite algorithm. GiST interface functions (split, union -
user-level functions) have a pointer to operand vector as argument.
Operand vector can't be a NULL, but some operands in the vector could
be NULL.


>
> At the very least you'd need to replace all the uses of
> DirectFunctionCallN to invoke the opclass support routines
> with code that is capable of detecting and signaling nulls.
> That would allow non-null-safe opclass routines to be protected
> by marking them "strict".

vaguely understand :-) DirectFunctionCallN are already interface to
opclass support routines. Do we need to build on yet another interface
just to mark bad users routines ? What should we do with that 'strict'
mark ?


>
> But that's a micro-issue.

agreed, but I'd like to require people write null-safe contribs
and remove your stopper.

The macro-issue is what you intend to
> do with NULLs in the first place.  I understand what btree does
> with them, but what's the corresponding concept for GIST?

if you mean first NULL keys in multikey GiST than just remove this tuple
from index because it's informativeless. btw, what btree does ?



>             regards, tom lane
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83




Re: handling NULLS in GiST

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> contrib/cube is just a bad example :-) In any case if you give an
> interface to developer it's his responsibility to be aware of possible
> errors. Developer has always a possibility to divide by zero.
> We could change contrib/cube to be null-safe.

My point is that as it stands, GIST is not honoring the defined
interface for nulls.  AFAICT you are relying on the called opclass
routines to test for null pointers, which is not clean.  (Among other
things, it means that you cannot work with pass-by-value datatypes.)
There has to be a separate isNull flag for each value.

contrib/cube very possibly is broken, but that doesn't mean that the
core GIST code isn't at fault too.

> DirectFunctionCallN are already interface to
> opclass support routines.

But the FunctionCallN routines do not allow passing or returning NULL.
That was a deliberate choice to preserve notational simplicity, because
most of the places where they needed to be used didn't have to worry
about NULLs.  You do, so you can't use those routines.

>> The macro-issue is what you intend to
>> do with NULLs in the first place.  I understand what btree does
>> with them, but what's the corresponding concept for GIST?

> if you mean first NULL keys in multikey GiST than just remove this tuple
> from index because it's informativeless. btw, what btree does ?

If you remove the tuple from the index then you're not storing NULLs.
You need to pick a rule that defines where null rows will get placed
in the index.  For btree, the rule is "nulls sort after all non-nulls".
        regards, tom lane