Thread: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
"Martin Atukunda"
Date:
The following bug has been logged online: Bug reference: 5784 Logged by: Martin Atukunda Email address: matlads@gmail.com PostgreSQL version: 8.4 Operating system: Linux Description: CREATE INDEX USING GIN complains about array containing null values yet none exist Details: CREATE INDEX USING GIN complains about array containing null values yet none exist. Take the following SQL as an example. -- >8 -- BEGIN; CREATE TABLE t (id serial primary key, apps bigint[]); INSERT INTO t (apps) VALUES ('{1,2,3,4}'); INSERT INTO t (apps) VALUES ('{1,2,3,4}'); -- insert an array with one element as a null value INSERT INTO t (apps) VALUES ('{1,2,3,NULL}'); -- remove the null value UPDATE t SET apps[4] = -1 WHERE apps[4] IS NULL; -- the create index fails CREATE INDEX t_apps_idx ON t USING GIN(apps); COMMIT; -- 8< -- note that index creatiion succeeds if I first create the index without inserting a row containing an array with a null element. - Martin -
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Andres Freund
Date:
Hi, On Sunday 05 December 2010 07:50:51 Martin Atukunda wrote: > > CREATE INDEX USING GIN complains about array containing null values yet > none exist. > > Take the following SQL as an example. > > -- >8 -- > > BEGIN; > CREATE TABLE t (id serial primary key, apps bigint[]); > INSERT INTO t (apps) VALUES ('{1,2,3,4}'); > INSERT INTO t (apps) VALUES ('{1,2,3,4}'); > > -- insert an array with one element as a null value > INSERT INTO t (apps) VALUES ('{1,2,3,NULL}'); > > -- remove the null value > UPDATE t SET apps[4] = -1 WHERE apps[4] IS NULL; > > -- the create index fails > CREATE INDEX t_apps_idx ON t USING GIN(apps); > > COMMIT; > > -- 8< -- > > > note that index creatiion succeeds if I first create the index without > inserting a row containing an array with a null element. Due to the wonders of MVCC the old row is still available in the heap. Best read the docs about what MVCC means. And as pg's indexes don't care about visibility it will still try to index the "old" row. Andres
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Martin Atukunda
Date:
> Due to the wonders of MVCC the old row is still available in the heap. Best > read the docs about what MVCC means. And as pg's indexes don't care about > visibility it will still try to index the "old" row. > > Thanks andreas, so, basically, the only way out of this would be to: 1. copy out all the rows 2. truncate the Tables 3. then create the index 4. copy in the rows - Martin -
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Andres Freund
Date:
On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote: > > Due to the wonders of MVCC the old row is still available in the heap. > > Best read the docs about what MVCC means. And as pg's indexes don't care > > about visibility it will still try to index the "old" row. > > Thanks andreas, > > so, basically, the only way out of this would be to: > > 1. copy out all the rows > 2. truncate the Tables > 3. then create the index > 4. copy in the rows Something like: ALTER TABLE t ALTER apps TYPE text[]; ALTER TABLE t ALTER apps TYPE bigint[] USING apps::bigint[]; could work. Andres
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Andres Freund
Date:
On Sunday 05 December 2010 13:29:35 Andres Freund wrote: > On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote: > > > Due to the wonders of MVCC the old row is still available in the heap. > > > Best read the docs about what MVCC means. And as pg's indexes don't > > > care about visibility it will still try to index the "old" row. > > > > Thanks andreas, > > > > so, basically, the only way out of this would be to: > > > > 1. copy out all the rows > > 2. truncate the Tables > > 3. then create the index > > 4. copy in the rows > > Something like: > > ALTER TABLE t ALTER apps TYPE text[]; > ALTER TABLE t ALTER apps TYPE bigint[] USING apps::bigint[]; On further thought the second one ought to be enough. Andres
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: >> On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote: >>>> so, basically, the only way out of this would be to: I think the reason the given example fails is just that it's all being done in one transaction. If the null-containing row were known dead it wouldn't get indexed. So: commit. regards, tom lane
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Andres Freund
Date:
On Sunday 05 December 2010 17:42:59 Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > >> On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote: > >>>> so, basically, the only way out of this would be to: > I think the reason the given example fails is just that it's all being > done in one transaction. If the null-containing row were known dead > it wouldn't get indexed. So: commit. Um I doubt it. test=# \i /tmp/test.sql DROP TABLE IF EXISTS t; DROP TABLE Time: 36.070 ms BEGIN; BEGIN Time: 0.122 ms CREATE TABLE t (id serial primary key, apps bigint[]); psql:/tmp/test.sql:3: NOTICE: 00000: CREATE TABLE will create implicit sequence "t_id_seq" for serial column "t.id" LOCATION: transformColumnDefinition, parse_utilcmd.c:341 psql:/tmp/test.sql:3: NOTICE: 00000: CREATE TABLE / PRIMARY KEY will create implicit index "t_pkey" for table "t" LOCATION: DefineIndex, indexcmds.c:438 CREATE TABLE Time: 102.843 ms INSERT INTO t (apps) VALUES ('{1,2,3,4}'); INSERT 0 1 Time: 0.408 ms INSERT INTO t (apps) VALUES ('{1,2,3,4}'); INSERT 0 1 Time: 0.111 ms INSERT INTO t (apps) VALUES ('{1,2,3,NULL}'); INSERT 0 1 Time: 0.127 ms UPDATE t SET apps[4] = -1 WHERE apps[4] IS NULL; UPDATE 1 Time: 0.271 ms COMMIT; COMMIT Time: 0.086 ms BEGIN; BEGIN Time: 0.192 ms CREATE INDEX t_apps_idx ON t USING GIN(apps); psql:/tmp/test.sql:18: ERROR: 22004: array must not contain null values LOCATION: ginarrayextract, ginarrayproc.c:53 COMMIT; ROLLBACK Time: 0.076 ms I remembered seeing that before.... Andres
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > On Sunday 05 December 2010 17:42:59 Tom Lane wrote: >> I think the reason the given example fails is just that it's all being >> done in one transaction. If the null-containing row were known dead >> it wouldn't get indexed. So: commit. > Um I doubt it. [ gets out gdb... ] Oh: the reason GIN is complaining is that it's just looking at ARR_HASNULL(), and the array's has-nulls flag is still set because we don't bother to try to clear it after replacing one element of the array. (Which in general would be an expensive thing to try to do...) If we were intending to leave GIN in its current nulls-hating state, the thing to do would be to replace the stupid ARR_HASNULL check with something more intelligent. But really it needs to be fixed to handle nulls properly, so I'm thinking that might be a dead-end patch. regards, tom lane
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Martin Atukunda
Date:
On Sun, Dec 5, 2010 at 7:58 PM, Andres Freund <andres@anarazel.de> wrote: > On Sunday 05 December 2010 13:29:35 Andres Freund wrote: > > On Sunday 05 December 2010 13:07:23 Martin Atukunda wrote: > > > > Due to the wonders of MVCC the old row is still available in the > heap. > > > > Best read the docs about what MVCC means. And as pg's indexes don't > > > > care about visibility it will still try to index the "old" row. > > > > > > Thanks andreas, > > > > > > so, basically, the only way out of this would be to: > > > > > > 1. copy out all the rows > > > 2. truncate the Tables > > > 3. then create the index > > > 4. copy in the rows > > > > Something like: > > > > ALTER TABLE t ALTER apps TYPE text[]; > > ALTER TABLE t ALTER apps TYPE bigint[] USING apps::bigint[]; > On further thought the second one ought to be enough. > Actually on my tests here both are required, though for the large tables - writing them twice makes the process very long. The copy out, truncate, create index, copy in approach seems to work best in this case. - Martin -
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Heikki Linnakangas
Date:
On 05.12.2010 18:26, Tom Lane wrote: > Andres Freund<andres@anarazel.de> writes: >> On Sunday 05 December 2010 17:42:59 Tom Lane wrote: >>> I think the reason the given example fails is just that it's all being >>> done in one transaction. If the null-containing row were known dead >>> it wouldn't get indexed. So: commit. > >> Um I doubt it. > > [ gets out gdb... ] Oh: the reason GIN is complaining is that it's just > looking at ARR_HASNULL(), and the array's has-nulls flag is still set > because we don't bother to try to clear it after replacing one element > of the array. (Which in general would be an expensive thing to try to > do...) > > If we were intending to leave GIN in its current nulls-hating state, > the thing to do would be to replace the stupid ARR_HASNULL check with > something more intelligent. But really it needs to be fixed to handle > nulls properly, so I'm thinking that might be a dead-end patch. Sounds like we'd still want to just replace ARR_HASNULL() with something more intelligent in back-branches though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: BUG #5784: CREATE INDEX USING GIN complains about array containing null values yet none exist
From
Bruce Momjian
Date:
Heikki Linnakangas wrote: > On 05.12.2010 18:26, Tom Lane wrote: > > Andres Freund<andres@anarazel.de> writes: > >> On Sunday 05 December 2010 17:42:59 Tom Lane wrote: > >>> I think the reason the given example fails is just that it's all being > >>> done in one transaction. If the null-containing row were known dead > >>> it wouldn't get indexed. So: commit. > > > >> Um I doubt it. > > > > [ gets out gdb... ] Oh: the reason GIN is complaining is that it's just > > looking at ARR_HASNULL(), and the array's has-nulls flag is still set > > because we don't bother to try to clear it after replacing one element > > of the array. (Which in general would be an expensive thing to try to > > do...) > > > > If we were intending to leave GIN in its current nulls-hating state, > > the thing to do would be to replace the stupid ARR_HASNULL check with > > something more intelligent. But really it needs to be fixed to handle > > nulls properly, so I'm thinking that might be a dead-end patch. > > Sounds like we'd still want to just replace ARR_HASNULL() with something > more intelligent in back-branches though. Added to TODO: Improve GIN's handling of NULL array values * http://archives.postgresql.org/pgsql-bugs/2010-12/msg00032.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +