Thread: Why creating GIN table index is so slow than inserting data into empty table with the same index?
Why creating GIN table index is so slow than inserting data into empty table with the same index?
From
Sergey Burladyan
Date:
example: select version(); version -------------------------------------------------------------------------------------------- PostgreSQL 8.3.6 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.3-3) 4.3.3 show maintenance_work_mem ; maintenance_work_mem ---------------------- 128MB create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int); insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n; INSERT 0 100000 Время: 570,110 мс create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) ); CREATE INDEX Время: 203068,314 мс truncate a; drop index arr_gin ; create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) ); CREATE INDEX Время: 3,246 мс insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n; INSERT 0 100000 Время: 2405,481 мс select pg_size_pretty(pg_total_relation_size('a')) as total, pg_size_pretty(pg_relation_size('a')) as table; total | table ---------+--------- 9792 kB | 5096 kB 203068.314 ms VS 2405.481 ms, is this behaviour normal ? Thanks ! -- Sergey Burladyan
Re: Why creating GIN table index is so slow than inserting data into empty table with the same index?
From
Tom Lane
Date:
Sergey Burladyan <eshkinkot@gmail.com> writes: > show maintenance_work_mem ; > maintenance_work_mem > ---------------------- > 128MB > create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int); > insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n; > create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) ); [ takes forever ] Seems the reason this is so awful is that the incoming data is exactly presorted, meaning that the tree structure that ginInsertEntry() is trying to build degenerates to a linear list (ie, every new element becomes the right child of the prior one). So the processing becomes O(N^2) up till you reach maintenance_work_mem and flush the tree. With a large setting for maintenance_work_mem it gets spectacularly slow. I think a reasonable solution for this might be to keep an eye on maxdepth and force a flush if that gets too large (more than a few hundred, perhaps?). Something like this: /* If we've maxed out our available memory, dump everything to the index */ + /* Also dump if the tree seems to be getting too unbalanced */ - if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L) + if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L || + buildstate->accum.maxdepth > DEPTH_LIMIT) { The new fast-insert code likely will need a similar defense. Comments? regards, tom lane
Re: Why creating GIN table index is so slow than inserting data into empty table with the same index?
From
Heikki Linnakangas
Date:
Tom Lane wrote: > Sergey Burladyan <eshkinkot@gmail.com> writes: >> show maintenance_work_mem ; >> maintenance_work_mem >> ---------------------- >> 128MB > >> create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int); >> insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n; >> create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) ); > > [ takes forever ] > > Seems the reason this is so awful is that the incoming data is exactly > presorted, meaning that the tree structure that ginInsertEntry() is > trying to build degenerates to a linear list (ie, every new element > becomes the right child of the prior one). So the processing becomes > O(N^2) up till you reach maintenance_work_mem and flush the tree. With > a large setting for maintenance_work_mem it gets spectacularly slow. Yes, this is probably the same issue I bumped into a while ago: http://archives.postgresql.org/message-id/49350A13.3020105@enterprisedb.com > I think a reasonable solution for this might be to keep an eye on > maxdepth and force a flush if that gets too large (more than a few > hundred, perhaps?). Something like this: > > /* If we've maxed out our available memory, dump everything to the index */ > + /* Also dump if the tree seems to be getting too unbalanced */ > - if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L) > + if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L || > + buildstate->accum.maxdepth > DEPTH_LIMIT) > { > > The new fast-insert code likely will need a similar defense. I fooled around with a balanced tree, which solved the problem but unfortunately made the unsorted case slower. Limiting the depth like that should avoid that so it's worth trying. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Why creating GIN table index is so slow than inserting data into empty table with the same index?
From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> I think a reasonable solution for this might be to keep an eye on >> maxdepth and force a flush if that gets too large (more than a few >> hundred, perhaps?). Something like this: > I fooled around with a balanced tree, which solved the problem but > unfortunately made the unsorted case slower. Yeah, rebalancing the search tree would fix that, but every balanced tree algorithm I know about is complicated, slow, and needs extra memory. It's really unclear that it'd be worth the trouble for a transient data structure like this one. regards, tom lane