Thread: index bloat question

index bloat question

From
Szymon Guz
Date:
Hi,
just a couple of questions:

will there be an index bloat if I have:
- a serial column and only add rows to the table?
- a text column and I only add rows to the table?

For the serial column the numbers are only incremented, for the text column I add random strings.

regards
Szymon

Re: index bloat question

From
Scott Marlowe
Date:
On Sun, Oct 16, 2011 at 2:41 PM, Szymon Guz <mabewlun@gmail.com> wrote:
> Hi,
> just a couple of questions:
> will there be an index bloat if I have:
> - a serial column and only add rows to the table?
> - a text column and I only add rows to the table?
> For the serial column the numbers are only incremented, for the text column
> I add random strings.

With no deletes or updates, the only bloat will be from a non 100% fill factor.

Re: index bloat question

From
Rob Sargent
Date:

Scott Marlowe wrote:
> On Sun, Oct 16, 2011 at 2:41 PM, Szymon Guz <mabewlun@gmail.com> wrote:
>
>> Hi,
>> just a couple of questions:
>> will there be an index bloat if I have:
>> - a serial column and only add rows to the table?
>> - a text column and I only add rows to the table?
>> For the serial column the numbers are only incremented, for the text column
>> I add random strings.
>>
>
> With no deletes or updates, the only bloat will be from a non 100% fill factor.
>
>
Is there no index rebalancing overhead on monotonically increasing keys
compare to (decently) random.  Seems one would continuously write to the
"right-most" sub-leaf node.

Re: index bloat question

From
Szymon Guz
Date:


On 17 October 2011 02:01, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Sun, Oct 16, 2011 at 2:41 PM, Szymon Guz <mabewlun@gmail.com> wrote:
> Hi,
> just a couple of questions:
> will there be an index bloat if I have:
> - a serial column and only add rows to the table?
> - a text column and I only add rows to the table?
> For the serial column the numbers are only incremented, for the text column
> I add random strings.

With no deletes or updates, the only bloat will be from a non 100% fill factor.

Hi Scott,

if there is no bloat, how could you explain this:

Simple test:

CREATE TABLE test (
id text primary key,
category_id text not null
);

CREATE INDEX i_category ON test (category_id);


I make 500k inserts in one transaction using a python script.
For the random text I use random uuid from the function:

uuid.uuid4()

After those inserts I create another index:

CREATE INDEX i_new ON test (category_id);


select
pg_size_pretty(pg_relation_size('i_category')),
pg_size_pretty(pg_relation_size('i_new'))
;

Results:

'37 MB';'28 MB'

regards
Szymon

Re: index bloat question

From
Merlin Moncure
Date:
On Mon, Oct 17, 2011 at 2:15 AM, Szymon Guz <mabewlun@gmail.com> wrote:
>
>
> On 17 October 2011 02:01, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>
>> On Sun, Oct 16, 2011 at 2:41 PM, Szymon Guz <mabewlun@gmail.com> wrote:
>> > Hi,
>> > just a couple of questions:
>> > will there be an index bloat if I have:
>> > - a serial column and only add rows to the table?
>> > - a text column and I only add rows to the table?
>> > For the serial column the numbers are only incremented, for the text
>> > column
>> > I add random strings.
>>
>> With no deletes or updates, the only bloat will be from a non 100% fill
>> factor.
>
> Hi Scott,
>
> if there is no bloat, how could you explain this:
>
> Simple test:
>
> CREATE TABLE test (
> id text primary key,
> category_id text not null
> );
>
> CREATE INDEX i_category ON test (category_id);
>
>
> I make 500k inserts in one transaction using a python script.
> For the random text I use random uuid from the function:
>
> uuid.uuid4()
>
> After those inserts I create another index:
>
> CREATE INDEX i_new ON test (category_id);
>
>
> select
> pg_size_pretty(pg_relation_size('i_category')),
> pg_size_pretty(pg_relation_size('i_new'))
> ;
>
> Results:
>
> '37 MB';'28 MB'

You didn't post your insertion script.  btree indexes are always going
to have extra space in them due to pages splitting and being only
partially filled -- insertion order over the range of your datum plays
into this (you'll get different index arrangements from random vs
ordered insertion).  for kicks, try reindexing both indexes and see
what the size is afterwords.

merlin

Re: index bloat question

From
Szymon Guz
Date:


On 17 October 2011 15:42, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Oct 17, 2011 at 2:15 AM, Szymon Guz <mabewlun@gmail.com> wrote:
>
>
> On 17 October 2011 02:01, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>
>> On Sun, Oct 16, 2011 at 2:41 PM, Szymon Guz <mabewlun@gmail.com> wrote:
>> > Hi,
>> > just a couple of questions:
>> > will there be an index bloat if I have:
>> > - a serial column and only add rows to the table?
>> > - a text column and I only add rows to the table?
>> > For the serial column the numbers are only incremented, for the text
>> > column
>> > I add random strings.
>>
>> With no deletes or updates, the only bloat will be from a non 100% fill
>> factor.
>
> Hi Scott,
>
> if there is no bloat, how could you explain this:
>
> Simple test:
>
> CREATE TABLE test (
> id text primary key,
> category_id text not null
> );
>
> CREATE INDEX i_category ON test (category_id);
>
>
> I make 500k inserts in one transaction using a python script.
> For the random text I use random uuid from the function:
>
> uuid.uuid4()
>
> After those inserts I create another index:
>
> CREATE INDEX i_new ON test (category_id);
>
>
> select
> pg_size_pretty(pg_relation_size('i_category')),
> pg_size_pretty(pg_relation_size('i_new'))
> ;
>
> Results:
>
> '37 MB';'28 MB'

You didn't post your insertion script.  btree indexes are always going
to have extra space in them due to pages splitting and being only
partially filled -- insertion order over the range of your datum plays
into this (you'll get different index arrangements from random vs
ordered insertion).  for kicks, try reindexing both indexes and see
what the size is afterwords.

merlin


Yep, after reindexing the sizes are the same, I even get it why sizes were different, thanks for the info.

regards
Szymon