Thread: Vacuums taking forever :(

Vacuums taking forever :(

From

Phoenix Kiula

Date:

03 February 2009, 08:46:42

In my conf_pg, the autovacuum is on, so the DB should be (or I hope is
being) regularly vacuumed.

These are my settings:


work_mem                     = 20MB
temp_buffers                 = 4096
authentication_timeout       = 10s
ssl                          = off
checkpoint_warning           = 3600
random_page_cost             = 1
autovacuum                   = on
autovacuum_vacuum_cost_delay = 20
vacuum_cost_delay            = 20
autovacuum_naptime           = 10
stats_start_collector        = on
stats_row_level              = on
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
autovacuum_analyze_scale_factor  = 0.02
autovacuum_vacuum_scale_factor   = 0.01



The autovacuum was clearly not enough, so we also have a crontab that
vacuums the tables every hour. This is PG 8.2.9.

These cron jobs are taking over 35 minutes for a vacuum! What's the
use of a vacuum if it takes that long, and the DB performance is
tragic in the meantime?

I'd truly appreciate some thoughts from people with experience of
vacuum management of highly available online databases. About 10-20
million accesses for this one. Most are SELECTs. We have about 500,000
INSERTs and about 800,000 UPDATEs. Just 11 tables, of which only one
is like 10 million rows. Two are close to 500,000 rows, rest are
really small. It is this 10 million row thing that's the worry.

Thanks!

Re: Vacuums taking forever :(

From

Andreas Wenk

Date:

03 February 2009, 09:11:00

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Phoenix Kiula schrieb:
> In my conf_pg, the autovacuum is on, so the DB should be (or I hope is
> being) regularly vacuumed.
>
> These are my settings:
>
>
> work_mem                     = 20MB
> temp_buffers                 = 4096
> authentication_timeout       = 10s
> ssl                          = off
> checkpoint_warning           = 3600
> random_page_cost             = 1
> autovacuum                   = on
> autovacuum_vacuum_cost_delay = 20
> vacuum_cost_delay            = 20
> autovacuum_naptime           = 10
> stats_start_collector        = on
> stats_row_level              = on
> autovacuum_vacuum_threshold  = 75
> autovacuum_analyze_threshold = 25
> autovacuum_analyze_scale_factor  = 0.02
> autovacuum_vacuum_scale_factor   = 0.01
>

Hi ,

just a quick thought. What is your maintenance_work_mem parameter set to? I think with
that lot Updates and Inserts this should not be too low ...

Cheers

Andy
>
> The autovacuum was clearly not enough, so we also have a crontab that
> vacuums the tables every hour. This is PG 8.2.9.
>
> These cron jobs are taking over 35 minutes for a vacuum! What's the
> use of a vacuum if it takes that long, and the DB performance is
> tragic in the meantime?
>
> I'd truly appreciate some thoughts from people with experience of
> vacuum management of highly available online databases. About 10-20
> million accesses for this one. Most are SELECTs. We have about 500,000
> INSERTs and about 800,000 UPDATEs. Just 11 tables, of which only one
> is like 10 million rows. Two are close to 500,000 rows, rest are
> really small. It is this 10 million row thing that's the worry.
>
> Thanks!
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJiEIgVa7znmSP9AwRAvUoAKCCuRycQVPCiEBkCxLvxrnXIa2ZqwCfZSI1
uooHCg8rIW6Zdt7pJU7YZMM=
=vO+P
-----END PGP SIGNATURE-----

Re: Vacuums taking forever :(

From

Andrew Sullivan

Date:

03 February 2009, 09:16:44

On Tue, Feb 03, 2009 at 08:46:37PM +0800, Phoenix Kiula wrote:
> The autovacuum was clearly not enough, so we also have a crontab that
> vacuums the tables every hour. This is PG 8.2.9.

How did you determine this?  What was it not enough for?  Which
tables?  Why didn't you tune autovacuum differently?

I think the cron jobs are your problem.

> These cron jobs are taking over 35 minutes for a vacuum! What's the
> use of a vacuum if it takes that long, and the DB performance is
> tragic in the meantime?

VACUUM uses disk bandwidth.  I suspect you are causing part of your
problem.

A
--
Andrew Sullivan
ajs@crankycanuck.ca

Re: Vacuums taking forever :(

From

Phoenix Kiula

Date:

03 February 2009, 09:44:27

On Tue, Feb 3, 2009 at 9:09 PM, Andreas Wenk
<a.wenk@netzmeister-st-pauli.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Phoenix Kiula schrieb:
>> In my conf_pg, the autovacuum is on, so the DB should be (or I hope is
>> being) regularly vacuumed.
>>
>> These are my settings:
>>
>>
>> work_mem                     = 20MB
>> temp_buffers                 = 4096
>> authentication_timeout       = 10s
>> ssl                          = off
>> checkpoint_warning           = 3600
>> random_page_cost             = 1
>> autovacuum                   = on
>> autovacuum_vacuum_cost_delay = 20
>> vacuum_cost_delay            = 20
>> autovacuum_naptime           = 10
>> stats_start_collector        = on
>> stats_row_level              = on
>> autovacuum_vacuum_threshold  = 75
>> autovacuum_analyze_threshold = 25
>> autovacuum_analyze_scale_factor  = 0.02
>> autovacuum_vacuum_scale_factor   = 0.01
>>
>
> Hi ,
>
> just a quick thought. What is your maintenance_work_mem parameter set to? I think with
> that lot Updates and Inserts this should not be too low ...



maintenance_work_mem = 512M
wal_buffers                  = 64
checkpoint_segments          = 128
checkpoint_timeout           = 900
fsync                        = on



Is this enough? My server has 4GB RAM.

Re: Vacuums taking forever :(

From

Andreas Wenk

Date:

03 February 2009, 09:56:17

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Phoenix Kiula schrieb:
> On Tue, Feb 3, 2009 at 9:09 PM, Andreas Wenk
> <a.wenk@netzmeister-st-pauli.de> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Phoenix Kiula schrieb:
>>> In my conf_pg, the autovacuum is on, so the DB should be (or I hope is
>>> being) regularly vacuumed.
>>>
>>> These are my settings:
>>>
>>>
>>> work_mem                     = 20MB
>>> temp_buffers                 = 4096
>>> authentication_timeout       = 10s
>>> ssl                          = off
>>> checkpoint_warning           = 3600
>>> random_page_cost             = 1
>>> autovacuum                   = on
>>> autovacuum_vacuum_cost_delay = 20
>>> vacuum_cost_delay            = 20
>>> autovacuum_naptime           = 10
>>> stats_start_collector        = on
>>> stats_row_level              = on
>>> autovacuum_vacuum_threshold  = 75
>>> autovacuum_analyze_threshold = 25
>>> autovacuum_analyze_scale_factor  = 0.02
>>> autovacuum_vacuum_scale_factor   = 0.01
>>>
>> Hi ,
>>
>> just a quick thought. What is your maintenance_work_mem parameter set to? I think with
>> that lot Updates and Inserts this should not be too low ...
>
>
>
> maintenance_work_mem = 512M
> wal_buffers                  = 64
> checkpoint_segments          = 128
> checkpoint_timeout           = 900
> fsync                        = on
>
>
>
> Is this enough? My server has 4GB RAM.
>

Try to increase it but check also what Andrew wrote in the other reply. Maybe ther is the
problem ...

Cheers

Andy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJiEz4Va7znmSP9AwRAsD8AKC4+98SzjKfi4YLgAQ1697rr2DgpgCg3oXH
PMYe+Y3OJf3YnisJ/rU2REI=
=d7/G
-----END PGP SIGNATURE-----

Re: Vacuums taking forever :(

From

Gregory Stark

Date:

03 February 2009, 10:08:22

Phoenix Kiula <phoenix.kiula@gmail.com> writes:

> autovacuum_vacuum_cost_delay = 20
> vacuum_cost_delay            = 20

These say to sleep 20ms every few pages.

> These cron jobs are taking over 35 minutes for a vacuum! What's the
> use of a vacuum if it takes that long, and the DB performance is
> tragic in the meantime?

The fact that vacuum takes 35 minutes really shouldn't be a concern. As long
as it isn't making it hard to manage vacuuming frequently enough what do you
care when vacuum finishes? You're not waiting on any results from it. In fact
the point of the above parameters is to ensure vacuum goes *slowly* enough to
avoid causing i/o slowdowns in the rest of the system.

The real question is why your performance is tragic while vacuum is running.
Sleeping 20ms periodically should really be enough to avoid causing any
performance impact. Unless your system is running extremely close to the
maximum throughput already and the small additional i/o is enough to tip it
over the edge?

Another option is to set the delay to 0 which *will* cause performance to be
tragic, but for as short a time as possible. I don't recommend this approach.

You could try raising the delay parameters or decreasing the vacuum_cost_limit
parameters which would make the "few pages" fewer. That would lessen the i/o
impact at the expense of lengthen vacuum's run time. But if you're already at
35% of the time between vacuums being necessary then that seems like it might
not be an option.

Upgrading to 8.3.x would reduce the need for vacuum at all if your updates
qualify for HOT updates. And 8.4 will lessen the impact of vacuums further.

But if you're already running that close to the red-line then you're going to
have problems soon even with less i/o from vacuum. It sounds like you need to
quantify just how much i/o your system is capable of handling and how close to
that level you're already at.

Keep in mind that random i/o is a *lot* more expensive than sequential i/o.
Typical consumer drives can handle 60MB/s+ of sequential i/o but only about
1-2MB/s of random i/o! It's easy to misjudge your capacity by basing it on
purely sequential i/o.

(Hmmmm. That makes me think that raising the vacuum_cost_limit parameter
dramatically and the vacuum_cost_delay parameter proportionally might make it
actually run faster with less i/o impact. The defaults process only a few
kilobytes before sleeping which probably cause a lot of random seeks. If you
multiple both by 10 then you'll process close to a megabyte of data and then
sleep for a long while. Just a thought -- I haven't tried this on a test box.)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: Vacuums taking forever :(

From

Simon Riggs

Date:

03 February 2009, 11:35:40

On Tue, 2009-02-03 at 20:46 +0800, Phoenix Kiula wrote:
> This is PG 8.2.9

VACUUM spoils the cache in 8.2

What happens is that VACUUM reads through the whole table, knocking
other blocks out of cache. These then need to be read in again by other
processes, so there is some I/O thrashing. If your bgwriter settings are
ineffective then normal users will also need to write the dirty blocks
left by VACUUM and probably flush WAL as well while doing it, using even
more I/O.

We fixed this in 8.3 so that VACUUM uses at most 256KB of memory as it
goes, which makes it both faster because of CPU L2 cache effects and
hardly spoils shared_buffer cache at all. Bgwriter is also better tuned
so it will handle dirty blocks better.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Re: Vacuums taking forever :(

From

Alan Hodgson

Date:

03 February 2009, 13:17:52

On Tuesday 03 February 2009, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> The autovacuum was clearly not enough, so we also have a crontab that
> vacuums the tables every hour. This is PG 8.2.9.

How did you determine it wasn't enough? As others have stated, you're
causing your own slowdown by running vacuum so much on tables that aren't
changing.

If you have a particular table that needs more frequent vacuums, like a
session table or something, maybe just cron that. But autovacuum should be
doing it more often than an hour anyway in that case ...

--
Alan

Re: Vacuums taking forever :(

From

Phoenix Kiula

Date:

03 February 2009, 14:31:10

Thanks, Gregory and Simon, for the very useful posts.

I have increased the vacuum_cost_limit to 2000 for now, just to see if
that has an impact. Hopefully positive.

Next on my list is to be able to easily upgrade to 8.3, but Slony
seemed like a daunting task the last time I tried. I am on 8.2.9, on
Linux CentOS. Is there a quick and reliable guide to upgrading,
without causing a downtime in a high production environment?

Re: Vacuums taking forever :(

From

Gregory Stark

Date:

04 February 2009, 08:28:35

Phoenix Kiula <phoenix.kiula@gmail.com> writes:

> Thanks, Gregory and Simon, for the very useful posts.
>
> I have increased the vacuum_cost_limit to 2000 for now, just to see if
> that has an impact. Hopefully positive.

Note that that was offhand speculation. Conventional wisdom is that it should
make things *worse* -- you're saying to process more pages between sleeping so
it'll use more i/o. I was speculating that you increased both
vacuum_cost_limit and vacuum_cost_delay proportionally it might use the i/o
more efficiently even though it's using the same amount of total bandwidth.

The more normal suggestion is to increase *vacuum_cost_delay* which tells it
to sleep longer between bits of work. Don't increase it too much or vacuum
will take forever. But if you increase it from 20 to 40 it should use half as
much i/o as bandwidth as now.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: Moving between major versions easily (Was: Vacuums taking forever :()

From

Alban Hertroys

Date:

05 February 2009, 03:58:42

On Feb 3, 2009, at 7:31 PM, Phoenix Kiula wrote:

> Thanks, Gregory and Simon, for the very useful posts.
>
> I have increased the vacuum_cost_limit to 2000 for now, just to see if
> that has an impact. Hopefully positive.
>
> Next on my list is to be able to easily upgrade to 8.3, but Slony
> seemed like a daunting task the last time I tried. I am on 8.2.9, on
> Linux CentOS. Is there a quick and reliable guide to upgrading,
> without causing a downtime in a high production environment?

I don't know whether this exists, but it could be quite useful to have
a connection pooler that would send it's queries to both a master and
slave database at once and only returns the results of the master. The
slave could then for example be the new version of PG. That receives
the same queries, so it's data should be up to date with the master
(if no errors occur) and it can be monitored for any problems in the
meanwhile without influencing the workings of the master DB. When
everything is as it should be simply tell the pool to switch to the
new DB, possibly removing the old master (although keeping it around
and after a while upgrade it in preparation of the next major upgrade
would probably be smart).

There are a few gotchas there of course:
- How would you create the clone in the first place? The master is
being updated while the clone is being created, so all queries that
happen during that time need to be performed on the slave yet,
somehow...
- What to do if the slave fails queries that the master eats just
fine? The data shouldn't get out of sync. For example due to the
recent stricter type-casting changes.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,498a9c2b747032137693194!