Thread: Null values in indexes

Null values in indexes

From
"Dann Corbit"
Date:
With 7.1.3, large indexes with null values allowed in one or more of the
columns would cause crashes.  (I have definitely seen this happen).

Here is a project that mentions repairs:
http://postgis.refractions.net/news/index.php?file=20020425.data

Have repairs been effected in 7.2?  Are they delayed until 7.3?


Re: Null values in indexes

From
Tom Lane
Date:
"Dann Corbit" <DCorbit@connx.com> writes:
> With 7.1.3, large indexes with null values allowed in one or more of the
> columns would cause crashes.  (I have definitely seen this happen).
> Have repairs been effected in 7.2?

Submit a test case and we'll tell you ...
        regards, tom lane


Re: Null values in indexes

From
Hannu Krosing
Date:
Not really a followup,but this has been on my mind for some time :


How hard would it be to _not_ include nulls in indexes 
as they are not used anyway. 

(IIRC postgres initially did not include nulls, but itwas added for multi-key btree indexes)

This would be a rough approximation of partial indexes 
if used together with functions, i.e. the optimiser 
would immediately realize that

WHERE my_ifunc(partfield) = 'header'

can use index on my_ifunc(partfield)

but my_ifunc() has an easy way of skipping indexing 
overhaed for non-interesting fields by returning NULL for them.

The following seems to prove thet there is currently 
no use of putting NULLS in a single-field index:

--------------------------

hannu=# create table itest (i int, n int);
CREATE
hannu=# create index itest_n_idx on itest(n);
CREATE

then I inserted 16k tuples

hannu=# insert into itest(i) select i+2 from itest;
INSERT 0 2
hannu=# insert into itest(i) select i+4 from itest;
INSERT 0 4
hannu=# insert into itest(i) select i+8 from itest;
INSERT 0 1024
...
hannu=# insert into itest(i) select i+2048 from itest;
INSERT 0 2048
hannu=# insert into itest(i) select i+4096 from itest;
INSERT 0 4096
hannu=# insert into itest(i) select i+8192 from itest;
UPDATE 16380

set most of n's to is but left 4 as NULLs

hannu=# update itest set n=1 where i>1;
UPDATE 16383

and vacuumed just in case 

hannu=# vacuum analyze itest;
VACUUM

now selects for real value do use index

hannu=# explain select * from itest where n = 7;
NOTICE:  QUERY PLAN:

Index Scan using itest_n_idx on itest  (cost=0.00..2.01 rows=1 width=8)

but IS NULL does not.

hannu=# explain select * from itest where n is null;
NOTICE:  QUERY PLAN:

Seq Scan on itest  (cost=0.00..341.84 rows=16 width=8)

EXPLAIN

------------------------
Hannu





Re: Null values in indexes

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> How hard would it be to _not_ include nulls in indexes 
> as they are not used anyway. 

Seems to me that would be a step backwards.

What should someday happen is to make IS NULL an indexable operator.
The fact that we haven't got around to doing so is not a reason to
rip out the underpinnings for it.
        regards, tom lane


Re: Null values in indexes

From
Jan Wieck
Date:
Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > How hard would it be to _not_ include nulls in indexes
> > as they are not used anyway.
>
> Seems to me that would be a step backwards.
   It  would cause multi-key indexes beeing unusable for partial   key lookup. Imagine you have a key over (a, b, c)
and query   with  WHERE  a = 1 AND b = 2. This query cannot use the index   if a NULL value in c  would  cause  the
index entry  to  be   suppressed.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: Null values in indexes

From
Tom Lane
Date:
Jan Wieck <janwieck@yahoo.com> writes:
> Tom Lane wrote:
>> Hannu Krosing <hannu@tm.ee> writes:
> How hard would it be to _not_ include nulls in indexes
> as they are not used anyway.
>> 
>> Seems to me that would be a step backwards.

>     It  would cause multi-key indexes beeing unusable for partial
>     key lookup. Imagine you have a key over (a, b, c)  and  query
>     with  WHERE  a = 1 AND b = 2. This query cannot use the index
>     if a NULL value in c  would  cause  the  index  entry  to  be
>     suppressed.

Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls.  AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.

Alternatively, we could fix GiST to support nulls.  Oleg, Teodor:
how far away might that be?
        regards, tom lane


Re: Null values in indexes

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, May 29, 2002 9:07 AM
> To: Jan Wieck
> Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Null values in indexes
>
>
> Jan Wieck <janwieck@yahoo.com> writes:
> > Tom Lane wrote:
> >> Hannu Krosing <hannu@tm.ee> writes:
> > How hard would it be to _not_ include nulls in indexes
> > as they are not used anyway.
> >>
> >> Seems to me that would be a step backwards.
>
> >     It  would cause multi-key indexes beeing unusable for partial
> >     key lookup. Imagine you have a key over (a, b, c)  and  query
> >     with  WHERE  a = 1 AND b = 2. This query cannot use the index
> >     if a NULL value in c  would  cause  the  index  entry  to  be
> >     suppressed.
>
> Urgh ... that means GiST indexing is actually broken, because GiST
> currently handles multicolumns but not nulls.  AFAIR the planner
> will try to use partial qualification on any multicolumn index...
> it had better avoid doing so for non-null-capable AMs.
>
> Alternatively, we could fix GiST to support nulls.  Oleg, Teodor:
> how far away might that be?

The PostGIS people have already fixed it.  However, they may not be
willing to contribute the patch.  On the other hand, I think it would be
in their interest, since the source code trees will fork if they don't
and they will have trouble staying in synch with PostgreSQL
developments.  (See the 7.2 index project here:
http://postgis.refractions.net/
http://postgis.refractions.net/news/index.php?file=20020425.data
)

If they are not willing to commit a patch, I suspect that they will at
least tell you what they had to do to fix it and it could be performed
internally.


Re: Null values in indexes

From
Paul Ramsey
Date:
Just to clarify:

- With respect to null safety. My understanding is the Oleg and Teodor
put support for nulls into the GiST indexing prior to the 7.2 release,
so 7.2 GiST should already be null-safe. Our project was just to take
our GiST bindings in PostGIS and update them to the new 7.2 GiST API, we
did no work on null-safety, null-safety was just one of the side
benefits we received as a result of updating our code to the 7.2 GiST
indexes.

- With respect to code contribution. If we find ourselves making changes
to the mainline PgSQL distribution we will always submit back. End of
story. All our changes have been to PostGIS itself, with the aim of
supporting 7.2. 7.2 rocks, we love it. :)

- There is one outstanding bug which we identified and Oleg and Teodor
fixed, but it is to the code in contrib/rtree, not in the mainline, and
Oleg and Teodor have already submitted that patch to Bruce. I believe
there was some unresolved discussion regarding whether to cut a 7.2.2
release including that patch and a few other housekeeping items.

Paul


Dann Corbit wrote:
> 
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Sent: Wednesday, May 29, 2002 9:07 AM
> > To: Jan Wieck
> > Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Null values in indexes
> >
> >
> > Jan Wieck <janwieck@yahoo.com> writes:
> > > Tom Lane wrote:
> > >> Hannu Krosing <hannu@tm.ee> writes:
> > > How hard would it be to _not_ include nulls in indexes
> > > as they are not used anyway.
> > >>
> > >> Seems to me that would be a step backwards.
> >
> > >     It  would cause multi-key indexes beeing unusable for partial
> > >     key lookup. Imagine you have a key over (a, b, c)  and  query
> > >     with  WHERE  a = 1 AND b = 2. This query cannot use the index
> > >     if a NULL value in c  would  cause  the  index  entry  to  be
> > >     suppressed.
> >
> > Urgh ... that means GiST indexing is actually broken, because GiST
> > currently handles multicolumns but not nulls.  AFAIR the planner
> > will try to use partial qualification on any multicolumn index...
> > it had better avoid doing so for non-null-capable AMs.
> >
> > Alternatively, we could fix GiST to support nulls.  Oleg, Teodor:
> > how far away might that be?
> 
> The PostGIS people have already fixed it.  However, they may not be
> willing to contribute the patch.  On the other hand, I think it would be
> in their interest, since the source code trees will fork if they don't
> and they will have trouble staying in synch with PostgreSQL
> developments.  (See the 7.2 index project here:
> http://postgis.refractions.net/
> http://postgis.refractions.net/news/index.php?file=20020425.data
> )
> 
> If they are not willing to commit a patch, I suspect that they will at
> least tell you what they had to do to fix it and it could be performed
> internally.

--      __    /    | Paul Ramsey    | Refractions Research    | Email: pramsey@refractions.net    | Phone: (250)
885-0632   \_
 


Re: Null values in indexes

From
Tom Lane
Date:
>> Urgh ... that means GiST indexing is actually broken, because GiST
>> currently handles multicolumns but not nulls.

Actually, it appears that 7.2 GiST does handle NULLs in columns after
the first one, which I think is enough to avoid the problem Jan
mentioned.  The boolean column pg_am.amindexnulls is not really
sufficient to describe this behavior accurately.  Looking at current
uses it seems correct to leave it set FALSE for GiST.

In short: false alarm; the 7.2 code is okay as-is, at least on this
particular point.
        regards, tom lane


Re: Null values in indexes

From
Oleg Bartunov
Date:
Glad to hear GiST in 7.2 isn't broken :-)
We miss the topic, what was the problem ?
Do we need to fix GiST code for 7.3 ?

proposal for null-safe GiST interface is available
http://fts.postgresql.org/db/mw/msg.html?mid=1028327
and discussion
http://fts.postgresql.org/db/mw/msg.html?mid=1025848

Regards,    Oleg
On Wed, 29 May 2002, Tom Lane wrote:

> >> Urgh ... that means GiST indexing is actually broken, because GiST
> >> currently handles multicolumns but not nulls.
>
> Actually, it appears that 7.2 GiST does handle NULLs in columns after
> the first one, which I think is enough to avoid the problem Jan
> mentioned.  The boolean column pg_am.amindexnulls is not really
> sufficient to describe this behavior accurately.  Looking at current
> uses it seems correct to leave it set FALSE for GiST.
>
> In short: false alarm; the 7.2 code is okay as-is, at least on this
> particular point.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Null values in indexes

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> Do we need to fix GiST code for 7.3 ?

No, I think it's fine.  I had forgotten that old discussion ...
        regards, tom lane


Re: Null values in indexes

From
"Chris Hodgson"
Date:
Hmm... I think there is some confusion here.

Oleg and Teodor updated the GiST indexing to be null safe for postgresql 7.2. 
The changes we made to PostGIS were just to allow our spacial indexing 
support functions to work with the changes made in the actual GiST indexing 
code (the GiST interface changed somewhat from postgresql 7.1 -> 7.2).

And for the record, I'm confident that we would submit a patch for postgresql 
if something like this did come up.

Chris Hodgson

Dann Corbit <DCorbit@connx.com> said:

> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Sent: Wednesday, May 29, 2002 9:07 AM
> > To: Jan Wieck
> > Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Null values in indexes 
> > 
> > 
> > Jan Wieck <janwieck@yahoo.com> writes:
> > > Tom Lane wrote:
> > >> Hannu Krosing <hannu@tm.ee> writes:
> > > How hard would it be to _not_ include nulls in indexes
> > > as they are not used anyway.
> > >> 
> > >> Seems to me that would be a step backwards.
> > 
> > >     It  would cause multi-key indexes beeing unusable for partial
> > >     key lookup. Imagine you have a key over (a, b, c)  and  query
> > >     with  WHERE  a = 1 AND b = 2. This query cannot use the index
> > >     if a NULL value in c  would  cause  the  index  entry  to  be
> > >     suppressed.
> > 
> > Urgh ... that means GiST indexing is actually broken, because GiST
> > currently handles multicolumns but not nulls.  AFAIR the planner
> > will try to use partial qualification on any multicolumn index...
> > it had better avoid doing so for non-null-capable AMs.
> > 
> > Alternatively, we could fix GiST to support nulls.  Oleg, Teodor:
> > how far away might that be?
> 
> The PostGIS people have already fixed it.  However, they may not be
> willing to contribute the patch.  On the other hand, I think it would be
> in their interest, since the source code trees will fork if they don't
> and they will have trouble staying in synch with PostgreSQL
> developments.  (See the 7.2 index project here:
> http://postgis.refractions.net/ 
> http://postgis.refractions.net/news/index.php?file=20020425.data
> )
> 
> If they are not willing to commit a patch, I suspect that they will at
> least tell you what they had to do to fix it and it could be performed
> internally.
> 
> 



--