Thread: FILLFACTOR and increasing index

FILLFACTOR and increasing index

From
Leonardo Francalanci
Date:
Hi,


the doc pages are somehow "cryptic" regarding FILLFACTOR.
(well, at least they're cryptic to me, since I don't know a lot
of btree stuff...)

I have an index on a timestamp value that is inserted, for 90%
of the inserts, in increasing order. No updates, no deletes on the
table (appends only).

What would be the "right" or "sensible" FILLFACTOR in this case?

I guess the docs need some example of "sensible" values for
some of the "classic" cases? (I'm sure I'm not the only one
storing a timestamp index in time-order...)



Leonardo

Re: FILLFACTOR and increasing index

From
Leonardo Francalanci
Date:
> > I  have an index on a timestamp value that is inserted, for 90%
> > of the  inserts, in increasing order. No updates, no deletes on the
> > table  (appends only).
>
> The bit about "increasing order" is a red herring  here.  If you have
> no updates, then you can leave the FILLFACTOR  alone.
>
> FILLFACTOR controls how much extra room there is in the way the  table
> is stored, so that if a row is UPDATEd it might be possible to  store
> the row in the same disk page.  This alleviates certain  pathological
> conditions with high-UPDATE tables and the way Postgres stores  the
> data (the non-overwriting storage manager).


(please add the list when replying to emails)

I'm talking about the index fillfactor, not the table fillfactor...

Re: FILLFACTOR and increasing index

From
Simon Riggs
Date:
On Mon, May 9, 2011 at 3:32 PM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> > I  have an index on a timestamp value that is inserted, for 90%
>> > of the  inserts, in increasing order. No updates, no deletes on the
>> > table  (appends only).
>>
>> The bit about "increasing order" is a red herring  here.  If you have
>> no updates, then you can leave the FILLFACTOR  alone.
>>
>> FILLFACTOR controls how much extra room there is in the way the  table
>> is stored, so that if a row is UPDATEd it might be possible to  store
>> the row in the same disk page.  This alleviates certain  pathological
>> conditions with high-UPDATE tables and the way Postgres stores  the
>> data (the non-overwriting storage manager).
>
>
> (please add the list when replying to emails)
>
> I'm talking about the index fillfactor, not the table fillfactor...

It will be really useful to see some test results where you alter the
fillfactor and report various measurables.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: FILLFACTOR and increasing index

From
Leonardo Francalanci
Date:
> It will be really useful to see some test  results where you alter the
> fillfactor and report various  measurables.


It's not that easy... stressing "only" the index insertion
speed won't be simple. I would have liked some "theory"...
The docs seem to imply there are some guidelines, it's
just that it's too cryptic:

"for heavily updated tables a smaller fillfactor is better
to minimize the need for page splits"


  "heavily updated" -> does it mean tables that are inserted/updated
or only "updated"???

"leaf pages are filled to this percentage [...] when extending the index
at the right (adding new largest key values)."


Does it mean that since I will (almost) always add new largest key
values, I should have a big or small FILLFACTOR???


I know that theory is one thing and real testing another; but I can't
test everything; if there are some (proved?) guidelines I'd like to
use them (example: I'm not going to test that fillfactor in table creation
in my case won't make any difference in   performance; I trust the
docs and the fact that "it makes sense").


Re: FILLFACTOR and increasing index

From
Tomas Vondra
Date:
Dne 9.5.2011 17:25, Leonardo Francalanci napsal(a):
>> It will be really useful to see some test  results where you alter the
>> fillfactor and report various  measurables.
>
>
> It's not that easy... stressing "only" the index insertion
> speed won't be simple. I would have liked some "theory"...
> The docs seem to imply there are some guidelines, it's
> just that it's too cryptic:
>
> "for heavily updated tables a smaller fillfactor is better
> to minimize the need for page splits"
>
>
>   "heavily updated" -> does it mean tables that are inserted/updated
> or only "updated"???

Well, an UPDATE is actually DELETE+INSERT (that's how PostgreSQL MVCC
works). It may be a bit more complicated with HOT, but that's not your
case, as you're only inserting data.

> "leaf pages are filled to this percentage [...] when extending the index
> at the right (adding new largest key values)."

Hmmm, not sure how exactly this works, but I guess that if you're only
inserting data then fillfactor=100 is the right thing. I believe it
kicks in only when you need to insert data into an 'old' leaf page. If
the page is full, then it needs to be split but if you reserve some free
space (using e.g. fillfactor=80) then the split is not needed.

> Does it mean that since I will (almost) always add new largest key
> values, I should have a big or small FILLFACTOR???

I'd go with the fillfactor=100.

> I know that theory is one thing and real testing another; but I can't
> test everything; if there are some (proved?) guidelines I'd like to
> use them (example: I'm not going to test that fillfactor in table creation
> in my case won't make any difference in   performance; I trust the
> docs and the fact that "it makes sense").

Yes, I use the same approach, but I'm not aware of any such guideline
related to fillfactor with indexes. Anyway those guidelines need to be
written by someone, so you have a great opportunity ;-)

regards
Tomas

Re: FILLFACTOR and increasing index

From
Tomas Vondra
Date:
Dne 9.5.2011 17:25, Leonardo Francalanci napsal(a):
> I know that theory is one thing and real testing another; but I can't
> test everything; if there are some (proved?) guidelines I'd like to
> use them (example: I'm not going to test that fillfactor in table creation
> in my case won't make any difference in   performance; I trust the
> docs and the fact that "it makes sense").
>

Anyway testing this (with the 'insert only' workload) may be quite simple:

================= fillfactor = 100 ====================

testdb=# create table test_fill (id int);
CREATE TABLE
Time: 2,515 ms

testdb=# create index test_fill_idx on test_fill(id) with (fillfactor=100);
CREATE INDEX
Time: 10,331 ms

testdb=# insert into test_fill select i from generate_series(1,1000000)
s(i);
INSERT 0 1000000
Time: 11542,512 ms

testdb=# select relpages from pg_class where relname = 'test_fill_idx';
 relpages
----------
     1977
(1 row)

==================== fillfactor = 70 ========================

testdb=# create table test_fill (id int);
CREATE TABLE
Time: 1,382 ms

testdb=# create index test_fill_idx on test_fill(id) with (fillfactor=70);
CREATE INDEX
Time: 10,296 ms

testdb=# insert into test_fill select i from generate_series(1,1000000)
s(i);
INSERT 0 1000000
Time: 11117,398 ms

testdb=# select relpages from pg_class where relname = 'test_fill_idx';
 relpages
----------
     2819
(1 row)

============================================================

So there seems to be no difference in insert performance (the INSERT
takes about 11s in both cases), but the size of the index with
fillfactor=70 needs much more space.

So with the insert only (in ascending order) workload, I'd go with
fillfactor=100 (or you may leave it at 90, which is the default value,
the difference will be negligible).

regards
Tomas


Re: FILLFACTOR and increasing index

From
Leonardo Francalanci
Date:
> Yes, I use the same approach, but  I'm not aware of any such guideline
> related to fillfactor with indexes.  Anyway those guidelines need to be
> written by someone, so you have a great  opportunity ;-)


I did a quick test using your example. As in your test, "increasing"
values don't get any gain from a different  fillfactor.
I tried a random index:

create table test_fill (id int);
create index test_fill_idx on test_fill(id) with (fillfactor=100);
insert into test_fill select (random()*100000)::integer from
generate_series(1,10000000) i;


time: 373936.724

drop table test_fill;
create table test_fill (id int);
create index test_fill_idx on test_fill(id) with (fillfactor=50);

insert into test_fill select (random()*100000)::integer from
generate_series(1,10000000) i;
time: 393209.911


not much difference...

Now I'm getting confused... is which cases fillfactor makes a difference???

Re: FILLFACTOR and increasing index

From
tv@fuzzy.cz
Date:
>> Yes, I use the same approach, but  I'm not aware of any such guideline
>> related to fillfactor with indexes.  Anyway those guidelines need to be
>> written by someone, so you have a great  opportunity ;-)
>
>
> I did a quick test using your example. As in your test, "increasing"
> values don't get any gain from a different  fillfactor.
> I tried a random index:
>
> create table test_fill (id int);
> create index test_fill_idx on test_fill(id) with (fillfactor=100);
> insert into test_fill select (random()*100000)::integer from
> generate_series(1,10000000) i;
>
>
> time: 373936.724
>
> drop table test_fill;
> create table test_fill (id int);
> create index test_fill_idx on test_fill(id) with (fillfactor=50);
>
> insert into test_fill select (random()*100000)::integer from
> generate_series(1,10000000) i;
> time: 393209.911
>
>
> not much difference...
>
> Now I'm getting confused... is which cases fillfactor makes a
> difference???

What about the index size? How much space do they occupy? Analyze the
table and do this

  SELECT relpages FROM pg_class WHERE relname = 'indexname';

and I believe you'll see the difference - the former index
(fillfactor=100) should grow much larger that the latter one.

The minimal performance difference is probably caused by the fact that
we're dealing with int4 column (and you've used just 100000 rows, i.e.
about 0.5MB of data) so the index is going to be tiny anyway.

Let's try to do that with varchar(32) column, just do something like this

db=# create table test_fill (id varchar(32));
db=# create index test_fill_idx on test_fill(id) with (fillfactor=100);
db=# insert into test_fill select md5(round(random()*100000)::text) from
generate_series(1,10000000) i;

I believe this might make a difference ...

regards
Tomas


Re: FILLFACTOR and increasing index

From
Leonardo Francalanci
Date:
> What about the index size?  How much space do they occupy? Analyze the
> table and do this


Of course space is different. That's not the point. The point is: I'm willing
to pay the price for another HD, if that helps with performance. But it doesn't.

>
> The minimal performance difference is  probably caused by the fact that
> we're dealing with int4 column (and you've  used just 100000 rows, i.e.
> about 0.5MB of data) so the index is going to be  tiny anyway.


I've used 10M rows, not 100000.

> Let's try to do that with varchar(32) column, just do  something like this


Did it with 5M rows. Still no difference.

Re: FILLFACTOR and increasing index

From
tv@fuzzy.cz
Date:
>
>> What about the index size?  How much space do they occupy? Analyze the
>> table and do this
>
>
> Of course space is different. That's not the point. The point is: I'm
> willing
> to pay the price for another HD, if that helps with performance. But it
> doesn't.
>
>>
>> The minimal performance difference is  probably caused by the fact that
>> we're dealing with int4 column (and you've  used just 100000 rows, i.e.
>> about 0.5MB of data) so the index is going to be  tiny anyway.
>
> I've used 10M rows, not 100000.

OK, I've misread the query - still, it's just 50MB of data.

>> Let's try to do that with varchar(32) column, just do  something like
>> this
>
>
> Did it with 5M rows. Still no difference.

Hm, so the page splits probably are not that expensive to affect this.

I wonder whether this would be true with multiple processes inserting data
into the index concurrently. I guess the process needs to obtain a lock to
do the page split, and that might make them much more expensive due to
contention.

But maybe I'm completely wrong - I really am not that familiar with btree
internals yet and didn't have to investigate this.

Anyway in your case (insert only, single process) I'd probably go with the
default value (fillfactor=90).

regards
Tomas


Re: FILLFACTOR and increasing index

From
Tomas Vondra
Date:
Hi,

I've studied the implementation of the btree indexes and how exactly the
fillfactor is used, and in general

- when a page split happens, the process needs to obtain more locks
  than with simple insert, which may result in contention with other
  processes that modify the index (the same pages)

- the fillfactor is used only for the leaf pages, the rest of the index
  does not use it (directly)

So lowering the number of page splits might remove some contention when
there's a lot of processes accessing the same pages.

But that's the theory - I really was not able to come up with a test
that benefits from lower fillfactor. Actually the lower the fillfactor,
the larger the index, which may be a significant issue with huge indexes.

So in your case, I'd probably go with the default fillfactor (90), and
maybe I'd consider even using fillfactor=100, to make the index as dense
as possible.

Anyway I guess the impact of this will be negligible, compared to other
parameters (shared buffers, work_mem, etc.).

regards
Tomas

Re: FILLFACTOR and increasing index

From
Boszormenyi Zoltan
Date:
Hi,

2011-05-12 00:28 keltezéssel, Tomas Vondra írta:
> Hi,
>
> I've studied the implementation of the btree indexes and how exactly the
> fillfactor is used, and in general
>
> - when a page split happens, the process needs to obtain more locks
>   than with simple insert, which may result in contention with other
>   processes that modify the index (the same pages)
>
> - the fillfactor is used only for the leaf pages, the rest of the index
>   does not use it (directly)
>
> So lowering the number of page splits might remove some contention when
> there's a lot of processes accessing the same pages.
>
> But that's the theory - I really was not able to come up with a test
> that benefits from lower fillfactor. Actually the lower the fillfactor,
> the larger the index, which may be a significant issue with huge indexes.

We recently had a testcase for exercising FILLFACTOR on indexes.
Several (15+) GB raw data arrives daily and must be imported into
the database for analytic purposes, the table is heavily partitioned
and each partition has 5 or 6 indexes. The importer is heavily threaded
and uses COPY to insert the data. This is strictly an INSERT-only scenario,
the analysis comes later. This is where FILLFACTOR=70 helped to
reduce the index contention.

With the default 90% (and 100% on the table itself) when a lot of data
arrived in burst that were in the same time interval so 4-8 threads tried
to push data into the same partition, individual data chunks (about 10-15MB
each, about 63500 pieces) were imported in about 1.5-2.5 minutes, with
a few spikes over 4 minutes. This was the result of a few days long
network outage, the data collectors sent their backlogs.

When all indexes were re-created with FILLFACTOR=70, the import times
went down considerably. Out of about 63000 raw data chunks, there were
only 6 or 8 where the import time fell in the 10-25 seconds range, the rest
of them were imported under 10 seconds and the majority of them (65%)
were imported under 3 seconds. The system has 24 cores, so we can use
8 of them dedicated to importing. This almost 1TB data was imported
in about 3.5-4 hours after the FILLFACTOR change. Without it, it didn't
finish in 2 days.

Best regards,
Zoltán Böszörményi

> So in your case, I'd probably go with the default fillfactor (90), and
> maybe I'd consider even using fillfactor=100, to make the index as dense
> as possible.
>
> Anyway I guess the impact of this will be negligible, compared to other
> parameters (shared buffers, work_mem, etc.).
>
> regards
> Tomas
>


--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
     http://www.postgresql.at/


Re: FILLFACTOR and increasing index

From
Tomas Vondra
Date:
Dne 12.6.2011 21:14, Boszormenyi Zoltan napsal(a):
> We recently had a testcase for exercising FILLFACTOR on indexes.
> Several (15+) GB raw data arrives daily and must be imported into
> the database for analytic purposes, the table is heavily partitioned
> and each partition has 5 or 6 indexes. The importer is heavily threaded
> and uses COPY to insert the data. This is strictly an INSERT-only scenario,
> the analysis comes later. This is where FILLFACTOR=70 helped to
> reduce the index contention.
>
> With the default 90% (and 100% on the table itself) when a lot of data
> arrived in burst that were in the same time interval so 4-8 threads tried
> to push data into the same partition, individual data chunks (about 10-15MB
> each, about 63500 pieces) were imported in about 1.5-2.5 minutes, with
> a few spikes over 4 minutes. This was the result of a few days long
> network outage, the data collectors sent their backlogs.
>
> When all indexes were re-created with FILLFACTOR=70, the import times
> went down considerably. Out of about 63000 raw data chunks, there were
> only 6 or 8 where the import time fell in the 10-25 seconds range, the rest
> of them were imported under 10 seconds and the majority of them (65%)
> were imported under 3 seconds. The system has 24 cores, so we can use
> 8 of them dedicated to importing. This almost 1TB data was imported
> in about 3.5-4 hours after the FILLFACTOR change. Without it, it didn't
> finish in 2 days.
>
> Best regards,
> Zoltán Böszörményi

Great, thanks for this info! So I guess it basically confirms my theory
that lowering fillfactor only helps with a lot of processes accessing
the same pages. Can you report how did the index size change?

regards
Tomas