BUG #3875: Building GIN index on full table could be slower than filling table with existing index. - Mailing list pgsql-bugs
From | Sokolov Yura |
---|---|
Subject | BUG #3875: Building GIN index on full table could be slower than filling table with existing index. |
Date | |
Msg-id | 200801151320.m0FDKW7g035228@wwwmaster.postgresql.org Whole thread Raw |
List | pgsql-bugs |
The following bug has been logged online: Bug reference: 3875 Logged by: Sokolov Yura Email address: funny.falcon@gmail.com PostgreSQL version: 8.3RC1 8.2.x Operating system: Debian 4.0 Description: Building GIN index on full table could be slower than filling table with existing index. Details: Here is pathological case: \timing drop table if exists test_gin; create table test_gin ( id serial primary key, ar int4[] ); create index test_gin_ix_ar on test_gin using gin ( ar ); -- And here is pathological pattern insert into test_gin (ar) select ARRAY[i, i-1, i+1, i-2, i+2] from generate_series(1, 20000) as i; drop index test_gin_ix_ar; create index test_gin_ix_ar on test_gin using gin ( ar ); Timing for this case: Timing is on. DROP TABLE Time: 19,421 ms CREATE TABLE Time: 13,531 ms CREATE INDEX Time: 3,352 ms INSERT 0 20000 ! Time: 1064,444 ms !! DROP INDEX Time: 30,925 ms CREATE INDEX ! Time: 28180,484 ms !!!! So that, creating index on full table is 26.5x slower than filling table with existing index. After applying a patch below timings are: ..... INSERT 0 20000 Time: 1036,746 ms ..... CREATE INDEX Time: 220,073 ms For other tested pattern patched timing usually slightly better (5-10%) and in rare case slightly worse(5-10%). Other tested patterns are: insert into test_gin (ar) select ARRAY[i, i-1, i+1, i-2, i+2] from generate_series(1, 100) as i, generate_series(1, 200) as j; insert into test_gin (ar) select ARRAY[i, i-1, i+1, i-2, i+2] from generate_series(1, 200) as j, generate_series(1, 100) as i; insert into test_gin (ar) select ARRAY[i, i-1, i+1, i-2, i+2] from generate_series(1, 200) as j, generate_series(1, 1000) as i; insert into test_gin (ar) select (select int_array_aggregate(k) from generate_series(i, i+200) as k) from generate_series(1, 200) as j, generate_series(1, 100) as i; Patch: diff -pr postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c postgresql-8.3RC1/src/backend/access/gin/ginbulk.c *** postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c 2008-01-01 22:45:46.000000000 +0300 --- postgresql-8.3RC1/src/backend/access/gin/ginbulk.c 2008-01-14 16:00:48.000000000 +0300 *************** ginInitBA(BuildAccumulator *accum) *** 28,33 **** --- 28,34 ---- accum->maxdepth = 1; accum->stackpos = 0; accum->entries = NULL; + accum->nentries = 0; accum->stack = NULL; accum->allocatedMemory = 0; accum->entryallocator = NULL; *************** EAAllocate(BuildAccumulator *accum) *** 44,49 **** --- 45,51 ---- } accum->length++; + accum->nentries++; return accum->entryallocator + accum->length - 1; } diff -pr postgresql-8.3RC1old/src/backend/access/gin/gininsert.c postgresql-8.3RC1/src/backend/access/gin/gininsert.c *** postgresql-8.3RC1old/src/backend/access/gin/gininsert.c 2008-01-01 22:45:46.000000000 +0300 --- postgresql-8.3RC1/src/backend/access/gin/gininsert.c 2008-01-14 16:07:21.000000000 +0300 *************** ginBuildCallback(Relation index, HeapTup *** 238,244 **** buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values, &htup->t_self); /* If we've maxed out our available memory, dump everything to the index */ ! if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L) { ItemPointerData *list; Datum entry; --- 238,246 ---- buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values, &htup->t_self); /* If we've maxed out our available memory, dump everything to the index */ ! if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L ! || (buildstate->accum.maxdepth > 16 && ! buildstate->accum.nentries < (1 << (buildstate->accum.maxdepth / 2)))) { ItemPointerData *list; Datum entry; diff -pr postgresql-8.3RC1old/src/include/access/gin.h postgresql-8.3RC1/src/include/access/gin.h *** postgresql-8.3RC1old/src/include/access/gin.h 2008-01-01 22:45:56.000000000 +0300 --- postgresql-8.3RC1/src/include/access/gin.h 2008-01-14 15:59:51.000000000 +0300 *************** typedef struct *** 450,455 **** --- 450,456 ---- { GinState *ginstate; EntryAccumulator *entries; + uint32 nentries; uint32 maxdepth; EntryAccumulator **stack; uint32 stackpos;
pgsql-bugs by date: