Thread: Insert only table and size of GIN index JSONB field.

Insert only table and size of GIN index JSONB field.

From
Maxim Boguk
Date:
Hi,

I started with empty table with index over
 custom_fields | jsonb
field
defined as:
    "idx_learners_custom_fields" gin (custom_fields)
Globally gin_pending_list_limit set to 2MB.
Database version is 9.5.2.

Now question:
If table populated with 1M records in single transaction then the final size of the GIN index is:
4265 MB
but after I performed reindex index idx_learners_custom_fields;
the index size had been reduced 15x to 295 MB.

Is this behavior expected?

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: Insert only table and size of GIN index JSONB field.

From
Maxim Boguk
Date:


On Tue, May 3, 2016 at 8:09 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
Hi,

I started with empty table with index over
 custom_fields | jsonb
field
defined as:
    "idx_learners_custom_fields" gin (custom_fields)
Globally gin_pending_list_limit set to 2MB.
Database version is 9.5.2.

Now question:
If table populated with 1M records in single transaction then the final size of the GIN index is:
4265 MB
but after I performed reindex index idx_learners_custom_fields;
the index size had been reduced 15x to 295 MB.

Is this behavior expected?

It's look easy to reproduce.
I can send a sample dataset for analyze.

drop table if exists test;
create table test (custom_fields jsonb);
create index test_gin_key on test USING GIN(custom_fields);
insert into test select custom_fields from public.learners;
INSERT 0 1000000
\di+ test_gin_key
                            List of relations
 Schema |     Name     | Type  |  Owner   | Table |  Size   | Description
--------+--------------+-------+----------+-------+---------+-------------
 public | test_gin_key | index | postgres | test  | 4211 MB |

reindex index test_gin_key;

\di+ test_gin_key
                            List of relations
 Schema |     Name     | Type  |  Owner   | Table |  Size  | Description
--------+--------------+-------+----------+-------+--------+-------------
 public | test_gin_key | index | postgres | test  | 295 MB |


 

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: Insert only table and size of GIN index JSONB field.

From
Oleg Bartunov
Date:


On Tue, May 3, 2016 at 1:35 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:


On Tue, May 3, 2016 at 8:09 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
Hi,

I started with empty table with index over
 custom_fields | jsonb
field
defined as:
    "idx_learners_custom_fields" gin (custom_fields)
Globally gin_pending_list_limit set to 2MB.
Database version is 9.5.2.

Now question:
If table populated with 1M records in single transaction then the final size of the GIN index is:
4265 MB
but after I performed reindex index idx_learners_custom_fields;
the index size had been reduced 15x to 295 MB.

Is this behavior expected?

It's look easy to reproduce.
I can send a sample dataset for analyze.

sample data are always welcome !

 

drop table if exists test;
create table test (custom_fields jsonb);
create index test_gin_key on test USING GIN(custom_fields);
insert into test select custom_fields from public.learners;
INSERT 0 1000000
\di+ test_gin_key
                            List of relations
 Schema |     Name     | Type  |  Owner   | Table |  Size   | Description
--------+--------------+-------+----------+-------+---------+-------------
 public | test_gin_key | index | postgres | test  | 4211 MB |

reindex index test_gin_key;

\di+ test_gin_key
                            List of relations
 Schema |     Name     | Type  |  Owner   | Table |  Size  | Description
--------+--------------+-------+----------+-------+--------+-------------
 public | test_gin_key | index | postgres | test  | 295 MB |


 

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."



Re: Insert only table and size of GIN index JSONB field.

From
Jeff Janes
Date:
On Tue, May 3, 2016 at 3:09 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
> Hi,
>
> I started with empty table with index over
>  custom_fields | jsonb
> field
> defined as:
>     "idx_learners_custom_fields" gin (custom_fields)
> Globally gin_pending_list_limit set to 2MB.
> Database version is 9.5.2.
>
> Now question:
> If table populated with 1M records in single transaction then the final size
> of the GIN index is:
> 4265 MB
> but after I performed reindex index idx_learners_custom_fields;
> the index size had been reduced 15x to 295 MB.
>
> Is this behavior expected?

This sounds like a known issue, fixed in 9.6, where the pages used for
the pending list do not eligible for recycling until the table is
vacuumed.  Autovacuum does not run on insert only tables, so they just
accumulate as empty pages in the index.

Cheers,

Jeff


Re: Insert only table and size of GIN index JSONB field.

From
Maxim Boguk
Date:


On Wed, May 4, 2016 at 3:45 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, May 3, 2016 at 3:09 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
> Hi,
>
> I started with empty table with index over
>  custom_fields | jsonb
> field
> defined as:
>     "idx_learners_custom_fields" gin (custom_fields)
> Globally gin_pending_list_limit set to 2MB.
> Database version is 9.5.2.
>
> Now question:
> If table populated with 1M records in single transaction then the final size
> of the GIN index is:
> 4265 MB
> but after I performed reindex index idx_learners_custom_fields;
> the index size had been reduced 15x to 295 MB.
>
> Is this behavior expected?

This sounds like a known issue, fixed in 9.6, where the pages used for
the pending list do not eligible for recycling until the table is
vacuumed.  Autovacuum does not run on insert only tables, so they just
accumulate as empty pages in the index.

Cheers,

Jeff


​Hi Jeff,

Yes it's look like a cause.
Is there any workaround possible for 9.5.2?
At this moment I see palliative (because it doesn't reduce already bloated index) cure via performing manual vacuum on the table after each batch insert
or very slow (for the large table) full scale fix via create new index/drop old index.


--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: Insert only table and size of GIN index JSONB field.

From
Jeff Janes
Date:
On Wed, May 4, 2016 at 1:39 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote:

>> This sounds like a known issue, fixed in 9.6, where the pages used for
>> the pending list do not eligible for recycling until the table is
>> vacuumed.  Autovacuum does not run on insert only tables, so they just
>> accumulate as empty pages in the index.
>>
>> Cheers,
>>
>> Jeff
>
>
>
> Hi Jeff,
>
> Yes it's look like a cause.
> Is there any workaround possible for 9.5.2?
> At this moment I see palliative (because it doesn't reduce already bloated
> index) cure via performing manual vacuum on the table after each batch
> insert
> or very slow (for the large table) full scale fix via create new index/drop
> old index.

A full workaround would be turn off "fastupdate" on the index.

In my workloads, doing that doesn't even slow inserts down by a
meaningful amount, so now I just habitually turn it off.

If you can't do that, you could run manual vacuum repeatedly during
the bulk load so the pages can be recycled within the same batch,
although that might slow things down more than just turning fastupdate
off does.

Or, you could just live with the bloat.  It is 15x when starting from
an empty table.  But if you are doing repeated batch inserts which
don't each start with an empty table, it won't be nearly as bad (as
long as you vacuum in between) on a ratio basis.  (And if you do
always start out with an empty table, you should build the index at
the end, not the beginning, of the inserts)

Cheers,

Jeff