Re: Fillfactor for GIN indexes - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Fillfactor for GIN indexes |
Date | |
Msg-id | CAPpHfdvb_MsCa8Qtcj_+APDiDBE-von3AN3LeV9yrj_v9VCZYQ@mail.gmail.com Whole thread Raw |
In response to | Re: Fillfactor for GIN indexes (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Fillfactor for GIN indexes
Re: Fillfactor for GIN indexes |
List | pgsql-hackers |
On Fri, Jan 16, 2015 at 8:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
------
With best regards,
Alexander Korotkov.
I'm struggling to understand why we shouldn't just reject this patch.On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Alexander Korotkov wrote:
>> I'm not sure. On the one hand it's unclear why fillfactor should be
>> different from 9.4.
>> On the other hand it's unclear why it should be different from btree.
>> I propose marking this "ready for committer". So, committer can make a final
>> decision.
> OK let's do so then. My preference is to fully pack the index at
> build. GIN compression has been one of the headlines of 9.4.
On November 27th, Cedric said:
"what are the benefits of this patch ? (maybe you had some test case
or a benchmark ?)"
Nobody replied. On January 15th, you (Michael) hypothesized that
"this patch has value to control random updates on GIN indexes" but
there seem to be absolutely no test results showing that any such
value exists.
There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?
I already wrote quite detailed explanation of subject. Let mel try to explain in shortly. GIN is two level nested btree. Thus, GIN would have absolutely same benefits from fillfactor as btree. Lack of tests showing it is, for sure, fault.
However, GIN posting trees are ordered by ItemPointer and this makes some specific. If you have freshly created table and do inserts/updates they would use the end of heap. Thus, inserts would go to the end of GIN posting tree and fillfactor wouldn't affect anything. Fillfactor would give benefits on HOT or heap space re-usage.
In the following example you can see that index size was increased greatly while updating every 20th row. It's because every update causes page split in index.
# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100, fastupdate=off);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ 3184 kB │
# update test set v = array[1,2] where id%20 = 0;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ 5264 kB │
(1 row)
But if we create index with fillfactor=90, index size would remain the same: no page splits.
# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);
# create index test_idx90 on test using gin(v) with (fillfactor=90, fastupdate=off);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx90 │ index │ smagen │ test │ 3520 kB │
# update test set v = array[1,2] where id%20 = 0;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx90 │ index │ smagen │ test │ 3520 kB │
(1 row)
Similar situation would be if we use fastupdate. But fastupdate takes some space for pending lists which is independent from fillfactor.
# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100, fastupdate=on);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ 3184 kB │
# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ 7256 kB │
# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=90, fastupdate=on);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ 3520 kB │
# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ 5512 kB │
BTW, previous version of patch contained some bugs. Revised version is attached.
With best regards,
Alexander Korotkov.
Attachment
pgsql-hackers by date: