Thread: Null values in indexes
With 7.1.3, large indexes with null values allowed in one or more of the columns would cause crashes. (I have definitely seen this happen). Here is a project that mentions repairs: http://postgis.refractions.net/news/index.php?file=20020425.data Have repairs been effected in 7.2? Are they delayed until 7.3?
"Dann Corbit" <DCorbit@connx.com> writes: > With 7.1.3, large indexes with null values allowed in one or more of the > columns would cause crashes. (I have definitely seen this happen). > Have repairs been effected in 7.2? Submit a test case and we'll tell you ... regards, tom lane
Not really a followup,but this has been on my mind for some time : How hard would it be to _not_ include nulls in indexes as they are not used anyway. (IIRC postgres initially did not include nulls, but itwas added for multi-key btree indexes) This would be a rough approximation of partial indexes if used together with functions, i.e. the optimiser would immediately realize that WHERE my_ifunc(partfield) = 'header' can use index on my_ifunc(partfield) but my_ifunc() has an easy way of skipping indexing overhaed for non-interesting fields by returning NULL for them. The following seems to prove thet there is currently no use of putting NULLS in a single-field index: -------------------------- hannu=# create table itest (i int, n int); CREATE hannu=# create index itest_n_idx on itest(n); CREATE then I inserted 16k tuples hannu=# insert into itest(i) select i+2 from itest; INSERT 0 2 hannu=# insert into itest(i) select i+4 from itest; INSERT 0 4 hannu=# insert into itest(i) select i+8 from itest; INSERT 0 1024 ... hannu=# insert into itest(i) select i+2048 from itest; INSERT 0 2048 hannu=# insert into itest(i) select i+4096 from itest; INSERT 0 4096 hannu=# insert into itest(i) select i+8192 from itest; UPDATE 16380 set most of n's to is but left 4 as NULLs hannu=# update itest set n=1 where i>1; UPDATE 16383 and vacuumed just in case hannu=# vacuum analyze itest; VACUUM now selects for real value do use index hannu=# explain select * from itest where n = 7; NOTICE: QUERY PLAN: Index Scan using itest_n_idx on itest (cost=0.00..2.01 rows=1 width=8) but IS NULL does not. hannu=# explain select * from itest where n is null; NOTICE: QUERY PLAN: Seq Scan on itest (cost=0.00..341.84 rows=16 width=8) EXPLAIN ------------------------ Hannu
Hannu Krosing <hannu@tm.ee> writes: > How hard would it be to _not_ include nulls in indexes > as they are not used anyway. Seems to me that would be a step backwards. What should someday happen is to make IS NULL an indexable operator. The fact that we haven't got around to doing so is not a reason to rip out the underpinnings for it. regards, tom lane
Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > How hard would it be to _not_ include nulls in indexes > > as they are not used anyway. > > Seems to me that would be a step backwards. It would cause multi-key indexes beeing unusable for partial key lookup. Imagine you have a key over (a, b, c) and query with WHERE a = 1 AND b = 2. This query cannot use the index if a NULL value in c would cause the index entry to be suppressed. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <janwieck@yahoo.com> writes: > Tom Lane wrote: >> Hannu Krosing <hannu@tm.ee> writes: > How hard would it be to _not_ include nulls in indexes > as they are not used anyway. >> >> Seems to me that would be a step backwards. > It would cause multi-key indexes beeing unusable for partial > key lookup. Imagine you have a key over (a, b, c) and query > with WHERE a = 1 AND b = 2. This query cannot use the index > if a NULL value in c would cause the index entry to be > suppressed. Urgh ... that means GiST indexing is actually broken, because GiST currently handles multicolumns but not nulls. AFAIR the planner will try to use partial qualification on any multicolumn index... it had better avoid doing so for non-null-capable AMs. Alternatively, we could fix GiST to support nulls. Oleg, Teodor: how far away might that be? regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, May 29, 2002 9:07 AM > To: Jan Wieck > Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Null values in indexes > > > Jan Wieck <janwieck@yahoo.com> writes: > > Tom Lane wrote: > >> Hannu Krosing <hannu@tm.ee> writes: > > How hard would it be to _not_ include nulls in indexes > > as they are not used anyway. > >> > >> Seems to me that would be a step backwards. > > > It would cause multi-key indexes beeing unusable for partial > > key lookup. Imagine you have a key over (a, b, c) and query > > with WHERE a = 1 AND b = 2. This query cannot use the index > > if a NULL value in c would cause the index entry to be > > suppressed. > > Urgh ... that means GiST indexing is actually broken, because GiST > currently handles multicolumns but not nulls. AFAIR the planner > will try to use partial qualification on any multicolumn index... > it had better avoid doing so for non-null-capable AMs. > > Alternatively, we could fix GiST to support nulls. Oleg, Teodor: > how far away might that be? The PostGIS people have already fixed it. However, they may not be willing to contribute the patch. On the other hand, I think it would be in their interest, since the source code trees will fork if they don't and they will have trouble staying in synch with PostgreSQL developments. (See the 7.2 index project here: http://postgis.refractions.net/ http://postgis.refractions.net/news/index.php?file=20020425.data ) If they are not willing to commit a patch, I suspect that they will at least tell you what they had to do to fix it and it could be performed internally.
Just to clarify: - With respect to null safety. My understanding is the Oleg and Teodor put support for nulls into the GiST indexing prior to the 7.2 release, so 7.2 GiST should already be null-safe. Our project was just to take our GiST bindings in PostGIS and update them to the new 7.2 GiST API, we did no work on null-safety, null-safety was just one of the side benefits we received as a result of updating our code to the 7.2 GiST indexes. - With respect to code contribution. If we find ourselves making changes to the mainline PgSQL distribution we will always submit back. End of story. All our changes have been to PostGIS itself, with the aim of supporting 7.2. 7.2 rocks, we love it. :) - There is one outstanding bug which we identified and Oleg and Teodor fixed, but it is to the code in contrib/rtree, not in the mainline, and Oleg and Teodor have already submitted that patch to Bruce. I believe there was some unresolved discussion regarding whether to cut a 7.2.2 release including that patch and a few other housekeeping items. Paul Dann Corbit wrote: > > > -----Original Message----- > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Sent: Wednesday, May 29, 2002 9:07 AM > > To: Jan Wieck > > Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Null values in indexes > > > > > > Jan Wieck <janwieck@yahoo.com> writes: > > > Tom Lane wrote: > > >> Hannu Krosing <hannu@tm.ee> writes: > > > How hard would it be to _not_ include nulls in indexes > > > as they are not used anyway. > > >> > > >> Seems to me that would be a step backwards. > > > > > It would cause multi-key indexes beeing unusable for partial > > > key lookup. Imagine you have a key over (a, b, c) and query > > > with WHERE a = 1 AND b = 2. This query cannot use the index > > > if a NULL value in c would cause the index entry to be > > > suppressed. > > > > Urgh ... that means GiST indexing is actually broken, because GiST > > currently handles multicolumns but not nulls. AFAIR the planner > > will try to use partial qualification on any multicolumn index... > > it had better avoid doing so for non-null-capable AMs. > > > > Alternatively, we could fix GiST to support nulls. Oleg, Teodor: > > how far away might that be? > > The PostGIS people have already fixed it. However, they may not be > willing to contribute the patch. On the other hand, I think it would be > in their interest, since the source code trees will fork if they don't > and they will have trouble staying in synch with PostgreSQL > developments. (See the 7.2 index project here: > http://postgis.refractions.net/ > http://postgis.refractions.net/news/index.php?file=20020425.data > ) > > If they are not willing to commit a patch, I suspect that they will at > least tell you what they had to do to fix it and it could be performed > internally. -- __ / | Paul Ramsey | Refractions Research | Email: pramsey@refractions.net | Phone: (250) 885-0632 \_
>> Urgh ... that means GiST indexing is actually broken, because GiST >> currently handles multicolumns but not nulls. Actually, it appears that 7.2 GiST does handle NULLs in columns after the first one, which I think is enough to avoid the problem Jan mentioned. The boolean column pg_am.amindexnulls is not really sufficient to describe this behavior accurately. Looking at current uses it seems correct to leave it set FALSE for GiST. In short: false alarm; the 7.2 code is okay as-is, at least on this particular point. regards, tom lane
Glad to hear GiST in 7.2 isn't broken :-) We miss the topic, what was the problem ? Do we need to fix GiST code for 7.3 ? proposal for null-safe GiST interface is available http://fts.postgresql.org/db/mw/msg.html?mid=1028327 and discussion http://fts.postgresql.org/db/mw/msg.html?mid=1025848 Regards, Oleg On Wed, 29 May 2002, Tom Lane wrote: > >> Urgh ... that means GiST indexing is actually broken, because GiST > >> currently handles multicolumns but not nulls. > > Actually, it appears that 7.2 GiST does handle NULLs in columns after > the first one, which I think is enough to avoid the problem Jan > mentioned. The boolean column pg_am.amindexnulls is not really > sufficient to describe this behavior accurately. Looking at current > uses it seems correct to leave it set FALSE for GiST. > > In short: false alarm; the 7.2 code is okay as-is, at least on this > particular point. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > Do we need to fix GiST code for 7.3 ? No, I think it's fine. I had forgotten that old discussion ... regards, tom lane
Hmm... I think there is some confusion here. Oleg and Teodor updated the GiST indexing to be null safe for postgresql 7.2. The changes we made to PostGIS were just to allow our spacial indexing support functions to work with the changes made in the actual GiST indexing code (the GiST interface changed somewhat from postgresql 7.1 -> 7.2). And for the record, I'm confident that we would submit a patch for postgresql if something like this did come up. Chris Hodgson Dann Corbit <DCorbit@connx.com> said: > > -----Original Message----- > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Sent: Wednesday, May 29, 2002 9:07 AM > > To: Jan Wieck > > Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Null values in indexes > > > > > > Jan Wieck <janwieck@yahoo.com> writes: > > > Tom Lane wrote: > > >> Hannu Krosing <hannu@tm.ee> writes: > > > How hard would it be to _not_ include nulls in indexes > > > as they are not used anyway. > > >> > > >> Seems to me that would be a step backwards. > > > > > It would cause multi-key indexes beeing unusable for partial > > > key lookup. Imagine you have a key over (a, b, c) and query > > > with WHERE a = 1 AND b = 2. This query cannot use the index > > > if a NULL value in c would cause the index entry to be > > > suppressed. > > > > Urgh ... that means GiST indexing is actually broken, because GiST > > currently handles multicolumns but not nulls. AFAIR the planner > > will try to use partial qualification on any multicolumn index... > > it had better avoid doing so for non-null-capable AMs. > > > > Alternatively, we could fix GiST to support nulls. Oleg, Teodor: > > how far away might that be? > > The PostGIS people have already fixed it. However, they may not be > willing to contribute the patch. On the other hand, I think it would be > in their interest, since the source code trees will fork if they don't > and they will have trouble staying in synch with PostgreSQL > developments. (See the 7.2 index project here: > http://postgis.refractions.net/ > http://postgis.refractions.net/news/index.php?file=20020425.data > ) > > If they are not willing to commit a patch, I suspect that they will at > least tell you what they had to do to fix it and it could be performed > internally. > > --