Thread: Index Bloat Problem

Index Bloat Problem

From
Strahinja Kustudić
Date:
We have PostgreSQL 9.1 running on Centos 5 on two SSDs, one for indices and one for data. The database is extremely active with reads and writes. We have autovacuum enabled, but we didn't tweak it's aggressiveness. The problem is that after some time the database grows even more than 100% on the file system and most of the growth is because the indices are a few times bigger than they should be, and when this happens, the performance of the DB drops.

For example, yesterday when I checked the database size on the production server it was 30GB, and the restored dump of that database was only 17GB. The most interesting thing is that the data wasn't bloated that much, but the indices were. Some of them were a few times bigger than they should be. For example an index on the production db is 440MB, while that same index after dump/restore is 17MB, and there are many indices with that high difference. We could fix the problem if we reindex the DB, but that makes our DB go offline and it's not possible to do in the production enviroment.

Is there a way to make the autovacuum daemon more aggressive, since I'm not exactly sure how to do that in this case? Would that even help? Is there another way to remove this index bloat?

Thanks in advance,
Strahinja

Re: Index Bloat Problem

From
hubert depesz lubaczewski
Date:
On Sat, Aug 11, 2012 at 12:15:11AM +0200, Strahinja Kustudić wrote:
> Is there a way to make the autovacuum daemon more aggressive, since I'm not
> exactly sure how to do that in this case? Would that even help? Is there
> another way to remove this index bloat?

http://www.depesz.com/index.php/2011/07/06/bloat-happens/

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/


Re: Index Bloat Problem

From
Mark Kirkwood
Date:
On 11/08/12 10:15, Strahinja Kustudić wrote:
> We have PostgreSQL 9.1 running on Centos 5 on two SSDs, one for indices and
> one for data. The database is extremely active with reads and writes. We
> have autovacuum enabled, but we didn't tweak it's aggressiveness. The
> problem is that after some time the database grows even more than 100% on
> the file system and most of the growth is because the indices are a few
> times bigger than they should be, and when this happens, the performance of
> the DB drops.
>
> For example, yesterday when I checked the database size on the production
> server it was 30GB, and the restored dump of that database was only 17GB.
> The most interesting thing is that the data wasn't bloated that much, but
> the indices were. Some of them were a few times bigger than they should be.
> For example an index on the production db is 440MB, while that same index
> after dump/restore is 17MB, and there are many indices with that high
> difference. We could fix the problem if we reindex the DB, but that makes
> our DB go offline and it's not possible to do in the production enviroment.
>
> Is there a way to make the autovacuum daemon more aggressive, since I'm not
> exactly sure how to do that in this case? Would that even help? Is there
> another way to remove this index bloat?
>
>

Some workloads can be difficult to tame. However I would try something
like this in postgresql.conf:

autovacuum_naptime= 10s
autovacuum_vacuum_scale_factor = 0.1

and maybe set log_autovacuum_min_duration so you see what autovacuum is
doing.

If the above settings don't help, then you could maybe monitor growth
and schedule regular REINDEXes on the tables concerned (at some suitably
quiet time).

Regards

Mark






Re: Index Bloat Problem

From
Jeff Janes
Date:
On Fri, Aug 10, 2012 at 3:15 PM, Strahinja Kustudić
<strahinjak@nordeus.com> wrote:
>
> For example, yesterday when I checked the database size on the production
> server it was 30GB, and the restored dump of that database was only 17GB.
> The most interesting thing is that the data wasn't bloated that much, but
> the indices were. Some of them were a few times bigger than they should be.
> For example an index on the production db is 440MB, while that same index
> after dump/restore is 17MB, and there are many indices with that high
> difference.

Could your pattern of deletions be leaving sparsely populated, but not
completely empty, index pages; which your insertions will then never
reuse because they never again insert values in that key range?

Cheers,

Jeff


Re: Index Bloat Problem

From
Strahinja Kustudić
Date:
Thanks for the help everyone and sorry for not replying sooner, I was on a business trip.

@Hubert pg_reorg looks really interesting and from the first read it looks to be a very good solution for maintenance, but for now I would rather try to slow down, or remove this bloat, so I have to do as less maintenance as possible.

@Mark So basically I should decrease the autovacuum nap time from 60s to 10s, reduce the scale factor from 0.2 to 0.1. log_autovacuum_min_duration is already set to 0, which means everything is logged.

@Jeff I'm not sure if I understand what you mean? I know that we never reuse key ranges. Could you be more clear, or give an example please.

Thanks in advance,
Strahinja



On Tue, Aug 14, 2012 at 6:14 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Fri, Aug 10, 2012 at 3:15 PM, Strahinja Kustudić
<strahinjak@nordeus.com> wrote:
>
> For example, yesterday when I checked the database size on the production
> server it was 30GB, and the restored dump of that database was only 17GB.
> The most interesting thing is that the data wasn't bloated that much, but
> the indices were. Some of them were a few times bigger than they should be.
> For example an index on the production db is 440MB, while that same index
> after dump/restore is 17MB, and there are many indices with that high
> difference.

Could your pattern of deletions be leaving sparsely populated, but not
completely empty, index pages; which your insertions will then never
reuse because they never again insert values in that key range?

Cheers,

Jeff

Re: Index Bloat Problem

From
Jeff Janes
Date:
On Thu, Aug 16, 2012 at 12:57 PM, Strahinja Kustudić
<strahinjak@nordeus.com> wrote:
>
> @Jeff I'm not sure if I understand what you mean? I know that we never reuse
> key ranges. Could you be more clear, or give an example please.

If an index leaf page is completely empty because every entry on it
were deleted, it will get recycled to be used in some other part of
the index.  (Eventually--it can take a while, especially if you have
long-running transactions).

But if the leaf page is only mostly empty, because only most of
entries on it were deleted, than it can never be reused, except for
entries that naturally fall into its existing key range (which will
never happen, if you never reuse key ranges)

So if you have a million records with keys 1..1000000, and do a
"delete from foo where key between 1 and 990000", then 99% of those
old index pages will become completely empty and eligible for reuse.
But if you do "delete from foo where key%100>0", then all of the pages
will become 99% empty, and none will be eligible for reuse (except the
very last one, which can still accept 1000001 and so on)

There has  been talk of allowing logically adjacent, mostly empty
pages to be merged so that one of them becomes empty, but the way
concurrent access to btree indexes was designed this is extremely hard
to do safely.

Cheers,

Jeff


Re: Index Bloat Problem

From
Greg Williamson
Date:
Thanks for this description--we have index bloat problems on a massively active (but small) database.This may help shed light on our problems.

Sorry for top-posting--challenged email reader.

Greg W.


From: Jeff Janes <jeff.janes@gmail.com>
To: Strahinja Kustudić <strahinjak@nordeus.com>
Cc: pgsql-performance@postgresql.org
Sent: Friday, August 17, 2012 7:33 PM
Subject: Re: [PERFORM] Index Bloat Problem

On Thu, Aug 16, 2012 at 12:57 PM, Strahinja Kustudić
<strahinjak@nordeus.com> wrote:
>
> @Jeff I'm not sure if I understand what you mean? I know that we never reuse
> key ranges. Could you be more clear, or give an example please.

If an index leaf page is completely empty because every entry on it
were deleted, it will get recycled to be used in some other part of
the index.  (Eventually--it can take a while, especially if you have
long-running transactions).

But if the leaf page is only mostly empty, because only most of
entries on it were deleted, than it can never be reused, except for
entries that naturally fall into its existing key range (which will
never happen, if you never reuse key ranges)

So if you have a million records with keys 1..1000000, and do a
"delete from foo where key between 1 and 990000", then 99% of those
old index pages will become completely empty and eligible for reuse.
But if you do "delete from foo where key%100>0", then all of the pages
will become 99% empty, and none will be eligible for reuse (except the
very last one, which can still accept 1000001 and so on)

There has  been talk of allowing logically adjacent, mostly empty
pages to be merged so that one of them becomes empty, but the way
concurrent access to btree indexes was designed this is extremely hard
to do safely.

Cheers,

Jeff


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance