BUG #3875: Building GIN index on full table could be slower than filling table with existing index. - Mailing list pgsql-bugs

From Sokolov Yura
Subject BUG #3875: Building GIN index on full table could be slower than filling table with existing index.
Date
Msg-id 200801151320.m0FDKW7g035228@wwwmaster.postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged online:

Bug reference:      3875
Logged by:          Sokolov Yura
Email address:      funny.falcon@gmail.com
PostgreSQL version: 8.3RC1 8.2.x
Operating system:   Debian 4.0
Description:        Building GIN index on full table could be slower than
filling table with existing index.
Details:

Here is pathological case:

    \timing
    drop table if exists test_gin;

    create table test_gin (
      id serial primary key,
      ar int4[]
    );

    create index test_gin_ix_ar on test_gin
    using gin
    ( ar );
    -- And here is pathological pattern
    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 20000) as i;

    drop index test_gin_ix_ar;

    create index test_gin_ix_ar on test_gin
    using gin
    ( ar );

Timing for this case:

    Timing is on.
    DROP TABLE
    Time: 19,421 ms
    CREATE TABLE
    Time: 13,531 ms
    CREATE INDEX
    Time: 3,352 ms
    INSERT 0 20000     !
    Time: 1064,444 ms  !!
    DROP INDEX
    Time: 30,925 ms
    CREATE INDEX       !
    Time: 28180,484 ms !!!!

So that, creating index on full table is 26.5x slower than filling table
with existing index.

After applying a patch below timings are:

    .....
    INSERT 0 20000
    Time: 1036,746 ms
    .....
    CREATE INDEX
    Time: 220,073 ms

For other tested pattern patched timing usually
slightly better (5-10%) and in rare case slightly
worse(5-10%). Other tested patterns are:

    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 100) as i,
         generate_series(1, 200) as j;

    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 200) as j,
         generate_series(1, 100) as i;

    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 200) as j,
         generate_series(1, 1000) as i;

    insert into test_gin (ar)
    select (select int_array_aggregate(k) from generate_series(i, i+200) as
k)
    from generate_series(1, 200) as j,
         generate_series(1, 100) as i;

Patch:

diff -pr postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c
postgresql-8.3RC1/src/backend/access/gin/ginbulk.c
*** postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c    2008-01-01
22:45:46.000000000 +0300
--- postgresql-8.3RC1/src/backend/access/gin/ginbulk.c    2008-01-14
16:00:48.000000000 +0300
*************** ginInitBA(BuildAccumulator *accum)
*** 28,33 ****
--- 28,34 ----
      accum->maxdepth = 1;
      accum->stackpos = 0;
      accum->entries = NULL;
+     accum->nentries = 0;
      accum->stack = NULL;
      accum->allocatedMemory = 0;
      accum->entryallocator = NULL;
*************** EAAllocate(BuildAccumulator *accum)
*** 44,49 ****
--- 45,51 ----
      }

      accum->length++;
+     accum->nentries++;
      return accum->entryallocator + accum->length - 1;
  }

diff -pr postgresql-8.3RC1old/src/backend/access/gin/gininsert.c
postgresql-8.3RC1/src/backend/access/gin/gininsert.c
*** postgresql-8.3RC1old/src/backend/access/gin/gininsert.c    2008-01-01
22:45:46.000000000 +0300
--- postgresql-8.3RC1/src/backend/access/gin/gininsert.c    2008-01-14
16:07:21.000000000 +0300
*************** ginBuildCallback(Relation index, HeapTup
*** 238,244 ****
      buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values,
&htup->t_self);

      /* If we've maxed out our available memory, dump everything to the index
*/
!     if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
      {
          ItemPointerData *list;
          Datum        entry;
--- 238,246 ----
      buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values,
&htup->t_self);

      /* If we've maxed out our available memory, dump everything to the index
*/
!     if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L
!         || (buildstate->accum.maxdepth > 16 &&
!             buildstate->accum.nentries < (1 << (buildstate->accum.maxdepth /
2))))
      {
          ItemPointerData *list;
          Datum        entry;
diff -pr postgresql-8.3RC1old/src/include/access/gin.h
postgresql-8.3RC1/src/include/access/gin.h
*** postgresql-8.3RC1old/src/include/access/gin.h    2008-01-01
22:45:56.000000000 +0300
--- postgresql-8.3RC1/src/include/access/gin.h    2008-01-14 15:59:51.000000000
+0300
*************** typedef struct
*** 450,455 ****
--- 450,456 ----
  {
      GinState   *ginstate;
      EntryAccumulator *entries;
+     uint32        nentries;
      uint32        maxdepth;
      EntryAccumulator **stack;
      uint32        stackpos;

pgsql-bugs by date:

Previous
From: Sokolov Yura
Date:
Subject: Re: BUG #3860: xpath crashes backend when is querying xmlagg result
Next
From: "Ruben Camargo Gomez"
Date:
Subject: BUG #3876: Problems migrating databases