Thread: Insert only table and size of GIN index JSONB field.
Hi,
I started with empty table with index over
custom_fields | jsonb
custom_fields | jsonb
field
defined as:
"idx_learners_custom_fields" gin (custom_fields)
"idx_learners_custom_fields" gin (custom_fields)
Globally gin_pending_list_limit set to 2MB.
Database version is 9.5.2.
Now question:
If table populated with 1M records in single transaction then the final size of the GIN index is:
4265 MB
If table populated with 1M records in single transaction then the final size of the GIN index is:
4265 MB
but after I performed reindex index idx_learners_custom_fields;
the index size had been reduced 15x to 295 MB.
Is this behavior expected?
--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
Senior Postgresql DBA
http://www.postgresql-consulting.com/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
On Tue, May 3, 2016 at 8:09 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
Hi,I started with empty table with index over
custom_fields | jsonbfielddefined as:
"idx_learners_custom_fields" gin (custom_fields)Globally gin_pending_list_limit set to 2MB.Database version is 9.5.2.Now question:
If table populated with 1M records in single transaction then the final size of the GIN index is:
4265 MBbut after I performed reindex index idx_learners_custom_fields;the index size had been reduced 15x to 295 MB.Is this behavior expected?
It's look easy to reproduce.
I can send a sample dataset for analyze.
I can send a sample dataset for analyze.
drop table if exists test;
create table test (custom_fields jsonb);
create index test_gin_key on test USING GIN(custom_fields);
insert into test select custom_fields from public.learners;
INSERT 0 1000000
\di+ test_gin_key
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+----------+-------+---------+-------------
public | test_gin_key | index | postgres | test | 4211 MB |
reindex index test_gin_key;
\di+ test_gin_key
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+----------+-------+--------+-------------
public | test_gin_key | index | postgres | test | 295 MB |
create table test (custom_fields jsonb);
create index test_gin_key on test USING GIN(custom_fields);
insert into test select custom_fields from public.learners;
INSERT 0 1000000
\di+ test_gin_key
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+----------+-------+---------+-------------
public | test_gin_key | index | postgres | test | 4211 MB |
reindex index test_gin_key;
\di+ test_gin_key
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+----------+-------+--------+-------------
public | test_gin_key | index | postgres | test | 295 MB |
--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
Senior Postgresql DBA
http://www.postgresql-consulting.com/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
On Tue, May 3, 2016 at 1:35 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Tue, May 3, 2016 at 8:09 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:Hi,I started with empty table with index over
custom_fields | jsonbfielddefined as:
"idx_learners_custom_fields" gin (custom_fields)Globally gin_pending_list_limit set to 2MB.Database version is 9.5.2.Now question:
If table populated with 1M records in single transaction then the final size of the GIN index is:
4265 MBbut after I performed reindex index idx_learners_custom_fields;the index size had been reduced 15x to 295 MB.Is this behavior expected?It's look easy to reproduce.
I can send a sample dataset for analyze.
sample data are always welcome !
drop table if exists test;
create table test (custom_fields jsonb);
create index test_gin_key on test USING GIN(custom_fields);
insert into test select custom_fields from public.learners;
INSERT 0 1000000
\di+ test_gin_key
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+----------+-------+---------+-------------
public | test_gin_key | index | postgres | test | 4211 MB |
reindex index test_gin_key;
\di+ test_gin_key
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+----------+-------+--------+-------------
public | test_gin_key | index | postgres | test | 295 MB |
--Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
On Tue, May 3, 2016 at 3:09 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote: > Hi, > > I started with empty table with index over > custom_fields | jsonb > field > defined as: > "idx_learners_custom_fields" gin (custom_fields) > Globally gin_pending_list_limit set to 2MB. > Database version is 9.5.2. > > Now question: > If table populated with 1M records in single transaction then the final size > of the GIN index is: > 4265 MB > but after I performed reindex index idx_learners_custom_fields; > the index size had been reduced 15x to 295 MB. > > Is this behavior expected? This sounds like a known issue, fixed in 9.6, where the pages used for the pending list do not eligible for recycling until the table is vacuumed. Autovacuum does not run on insert only tables, so they just accumulate as empty pages in the index. Cheers, Jeff
On Wed, May 4, 2016 at 3:45 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, May 3, 2016 at 3:09 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
> Hi,
>
> I started with empty table with index over
> custom_fields | jsonb
> field
> defined as:
> "idx_learners_custom_fields" gin (custom_fields)
> Globally gin_pending_list_limit set to 2MB.
> Database version is 9.5.2.
>
> Now question:
> If table populated with 1M records in single transaction then the final size
> of the GIN index is:
> 4265 MB
> but after I performed reindex index idx_learners_custom_fields;
> the index size had been reduced 15x to 295 MB.
>
> Is this behavior expected?
This sounds like a known issue, fixed in 9.6, where the pages used for
the pending list do not eligible for recycling until the table is
vacuumed. Autovacuum does not run on insert only tables, so they just
accumulate as empty pages in the index.
Cheers,
Jeff
Hi Jeff,
Yes it's look like a cause.
Is there any workaround possible for 9.5.2?
At this moment I see palliative (because it doesn't reduce already bloated index) cure via performing manual vacuum on the table after each batch insert
or very slow (for the large table) full scale fix via create new index/drop old index.
--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
Senior Postgresql DBA
http://www.postgresql-consulting.com/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
On Wed, May 4, 2016 at 1:39 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote: >> This sounds like a known issue, fixed in 9.6, where the pages used for >> the pending list do not eligible for recycling until the table is >> vacuumed. Autovacuum does not run on insert only tables, so they just >> accumulate as empty pages in the index. >> >> Cheers, >> >> Jeff > > > > Hi Jeff, > > Yes it's look like a cause. > Is there any workaround possible for 9.5.2? > At this moment I see palliative (because it doesn't reduce already bloated > index) cure via performing manual vacuum on the table after each batch > insert > or very slow (for the large table) full scale fix via create new index/drop > old index. A full workaround would be turn off "fastupdate" on the index. In my workloads, doing that doesn't even slow inserts down by a meaningful amount, so now I just habitually turn it off. If you can't do that, you could run manual vacuum repeatedly during the bulk load so the pages can be recycled within the same batch, although that might slow things down more than just turning fastupdate off does. Or, you could just live with the bloat. It is 15x when starting from an empty table. But if you are doing repeated batch inserts which don't each start with an empty table, it won't be nearly as bad (as long as you vacuum in between) on a ratio basis. (And if you do always start out with an empty table, you should build the index at the end, not the beginning, of the inserts) Cheers, Jeff