Thread: Question about VACUUM

Question about VACUUM

From
Ernesto Quiñones
Date:
Hi friends

I want to know if it's possible to predict (calculate), how long a
VACUUM FULL process will consume in a table?

can I apply some formula to calculate this?

thanks



--
----------------------------------------------------------
Visita : http://www.eqsoft.net
----------------------------------------------------------
Sigueme en Twitter : http://www.twitter.com/ernestoq

Re: Question about VACUUM

From
Scott Marlowe
Date:
On Fri, Dec 2, 2011 at 8:32 PM, Ernesto Quiñones <ernestoq@gmail.com> wrote:
> Hi friends
>
> I want to know if it's possible to predict (calculate), how long a
> VACUUM FULL process will consume in a table?
>
> can I apply some formula to calculate this?

If you look at what iostat is doing while the vacuum full is running,
and divide the size of the table by that k/sec you can get a good
approximation of how long it will take.  Do you have naptime set to
anything above 0?

Re: Question about VACUUM

From
Ernesto Quiñones
Date:
Thanks for the answer Scott, actually my  autovacuum_naptime is 1h ..
but I don't find naptime parameter for a manual vacuum

thanks again

2011/12/2 Scott Marlowe <scott.marlowe@gmail.com>:
> On Fri, Dec 2, 2011 at 8:32 PM, Ernesto Quiñones <ernestoq@gmail.com> wrote:
>> Hi friends
>>
>> I want to know if it's possible to predict (calculate), how long a
>> VACUUM FULL process will consume in a table?
>>
>> can I apply some formula to calculate this?
>
> If you look at what iostat is doing while the vacuum full is running,
> and divide the size of the table by that k/sec you can get a good
> approximation of how long it will take.  Do you have naptime set to
> anything above 0?



--
----------------------------------------------------------
Visita : http://www.eqsoft.net
----------------------------------------------------------
Sigueme en Twitter : http://www.twitter.com/ernestoq

Re: Question about VACUUM

From
"Kevin Grittner"
Date:
Ernesto Quiñones wrote:
> Scott Marlowe  wrote:
>> Ernesto Quiñones  wrote:

>>> I want to know if it's possible to predict (calculate), how long
>>> a VACUUM FULL process will consume in a table?

I don't think you said what version of PostgreSQL you're using.
VACUUM FULL prior to version 9.0 is not recommended for most
situations, and can take days or weeks to complete where other
methods of achieving the same end may take hours.  If you have
autovacuum properly configured, you will probably never need to run
VACUUM FULL.

>> If you look at what iostat is doing while the vacuum full is
>> running, and divide the size of the table by that k/sec you can
>> get a good approximation of how long it will take. Do you have
>> naptime set to anything above 0?
>
> Thanks for the answer Scott, actually my autovacuum_naptime is 1h

Ah, well that right there is likely to put you into a position where
you need to do painful extraordinary cleanup like VACUUM FULL.  In
most situation the autovacuum defaults are pretty good.  Where they
need to be adjusted, the normal things which are actually beneficial
are to change the thresholds to allow more aggressive cleanup or (on
low-powered hardware) to adjust the cost ratios so that performance
is less affected by the autovacuum runs.  When autovacuum is disabled
or changed to a long interval, it almost always results in bloat
and/or outdated statistics which cause much more pain than a more
aggressive autovacuum regimine does.

> but I don't find naptime parameter for a manual vacuum

I'm guessing that Scott was thinking of the vacuum_cost_delay
setting:

http://www.postgresql.org/docs/current/interactive/runtime-config-resource.html#GUC-VACUUM-COST-DELAY

-Kevin

Re: Question about VACUUM

From
Scott Marlowe
Date:
On Sat, Dec 3, 2011 at 6:11 AM, Ernesto Quiñones <ernestoq@gmail.com> wrote:
> Thanks for the answer Scott, actually my  autovacuum_naptime is 1h ..
> but I don't find naptime parameter for a manual vacuum

That's really high, but what I meant to as was what your
vacuum_cost_delay was set to.  Also vacuum_cost_limit.

Re: Question about VACUUM

From
Ernesto Quiñones
Date:
Hi Kevin, comments after your comments

2011/12/3 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
> Ernesto Quiñones wrote:
>> Scott Marlowe  wrote:
>>> Ernesto Quiñones  wrote:
>
>>>> I want to know if it's possible to predict (calculate), how long
>>>> a VACUUM FULL process will consume in a table?
>
> I don't think you said what version of PostgreSQL you're using.
> VACUUM FULL prior to version 9.0 is not recommended for most
> situations, and can take days or weeks to complete where other
> methods of achieving the same end may take hours.  If you have
> autovacuum properly configured, you will probably never need to run
> VACUUM FULL.

I'm working with PostgreSQL 8.3 running in Solaris 10, my autovacuum
paramaters are:

autovacuum    on
autovacuum_analyze_scale_factor        0,5
autovacuum_analyze_threshold50000
autovacuum_freeze_max_age     200000000
autovacuum_max_workers    3
autovacuum_naptime        1h
autovacuum_vacuum_cost_delay     -1
autovacuum_vacuum_cost_limit    -1
autovacuum_vacuum_scale_factor 0,5
autovacuum_vacuum_threshold 50000

my vacuums parameters are:

vacuum_cost_delay    1s
vacuum_cost_limit    200
vacuum_cost_page_dirty    20
vacuum_cost_page_hit    1
vacuum_cost_page_miss    10
vacuum_freeze_min_age    100000000


> Ah, well that right there is likely to put you into a position where
> you need to do painful extraordinary cleanup like VACUUM FULL.  In
> most situation the autovacuum defaults are pretty good.  Where they
> need to be adjusted, the normal things which are actually beneficial
> are to change the thresholds to allow more aggressive cleanup or (on
> low-powered hardware) to adjust the cost ratios so that performance
> is less affected by the autovacuum runs.

I have a good performance in my hard disks, I have a good amount of
memory, but my cores are very poor, only 1ghz each one.

I have some questions here:

1. autovacuum_max_workers= 3  , each work processes is using only one
"core" or one "core" it's sharing por 3 workers?

2. when I run a "explain analyze" in a very big table (30millons of
rows) , explain returning me 32 millons of rows moved, I am assuming
that my statistics are not updated in 2 millons of rows, but, is it a
very important number? or maybe, it's a regular result.


thanks for your help?

Re: Question about VACUUM

From
Scott Marlowe
Date:
On Mon, Dec 5, 2011 at 10:19 AM, Ernesto Quiñones <ernestoq@gmail.com> wrote:
> Hi Kevin, comments after your comments
>
> 2011/12/3 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
>> Ernesto Quiñones wrote:
>>> Scott Marlowe  wrote:
>>>> Ernesto Quiñones  wrote:
>>
>>>>> I want to know if it's possible to predict (calculate), how long
>>>>> a VACUUM FULL process will consume in a table?
>>
>> I don't think you said what version of PostgreSQL you're using.
>> VACUUM FULL prior to version 9.0 is not recommended for most
>> situations, and can take days or weeks to complete where other
>> methods of achieving the same end may take hours.  If you have
>> autovacuum properly configured, you will probably never need to run
>> VACUUM FULL.
>
> I'm working with PostgreSQL 8.3 running in Solaris 10, my autovacuum
> paramaters are:
>
> autovacuum      on
> autovacuum_analyze_scale_factor         0,5
> autovacuum_analyze_threshold50000
> autovacuum_freeze_max_age       200000000
> autovacuum_max_workers  3
> autovacuum_naptime              1h
> autovacuum_vacuum_cost_delay     -1
> autovacuum_vacuum_cost_limit    -1
> autovacuum_vacuum_scale_factor 0,5
> autovacuum_vacuum_threshold 50000
>
> my vacuums parameters are:
>
> vacuum_cost_delay       1s
> vacuum_cost_limit       200

Those are insane settings for vacuum costing, even on a very slow
machine.  Basically you're starving vacuum and autovacuum so much that
they can never keep up.

> I have a good performance in my hard disks, I have a good amount of
> memory, but my cores are very poor, only 1ghz each one.

If so then your settings for vacuum costing are doubly bad.

I'd start by setting the cost_delay to 1ms and raising your cost limit
by a factor of 10 or more.

> I have some questions here:
>
> 1. autovacuum_max_workers= 3  , each work processes is using only one
> "core" or one "core" it's sharing por 3 workers?

Each worker uses a single process and can use one core basically.
Right now your vacuum costing is such that it's using 1/100000th or so
of a CPU.

> 2. when I run a "explain analyze" in a very big table (30millons of
> rows) , explain returning me 32 millons of rows moved, I am assuming
> that my statistics are not updated in 2 millons of rows, but, is it a
> very important number? or maybe, it's a regular result.

Look for projections being off by factors of 10 or more before it
starts to make a big difference.  32M versus 30M is no big deal.  30k
versus 30M is a big deal.

Re: Question about VACUUM

From
Scott Marlowe
Date:
On Mon, Dec 5, 2011 at 10:42 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Mon, Dec 5, 2011 at 10:19 AM, Ernesto Quiñones <ernestoq@gmail.com> wrote:
>> vacuum_cost_delay       1s
>> vacuum_cost_limit       200
>
> Those are insane settings for vacuum costing, even on a very slow
> machine.  Basically you're starving vacuum and autovacuum so much that
> they can never keep up.

sorry, the word I meant there was pathological.  No insult intended.

Re: Question about VACUUM

From
Ernesto Quiñones
Date:
no problem Scott, thanks for your appreciations



2011/12/5 Scott Marlowe <scott.marlowe@gmail.com>:
> On Mon, Dec 5, 2011 at 10:42 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Mon, Dec 5, 2011 at 10:19 AM, Ernesto Quiñones <ernestoq@gmail.com> wrote:
>>> vacuum_cost_delay       1s
>>> vacuum_cost_limit       200
>>
>> Those are insane settings for vacuum costing, even on a very slow
>> machine.  Basically you're starving vacuum and autovacuum so much that
>> they can never keep up.
>
> sorry, the word I meant there was pathological.  No insult intended.



--
----------------------------------------------------------
Visita : http://www.eqsoft.net
----------------------------------------------------------
Sigueme en Twitter : http://www.twitter.com/ernestoq

Re: Question about VACUUM

From
"Kevin Grittner"
Date:
Ernesto Quiñones<ernestoq@gmail.com> wrote:

I understand the impulse to run autovacuum less frequently or less
aggressively.  When we first started running PostgreSQL the default
configuration was very cautious.  A lot of bloat would accumulate
before it kicked in, at which point there was a noticeable
performance hit, as it worked though a large number of dead pages.
The first thing I did was to make it run less often, which only made
things worse.  The numbers we settled on through testing as optimal
for us are very close to current default values (for recent major
releases).

Not only do queries run more quickly between autovacuum runs,
because there is less dead space to wade through to get the current
tuples, but the autovacuum runs just don't have the same degree of
impact -- presumably because they find less to do.  Some small,
frequently updated tables when from having hundreds of pages down to
one or two.

> autovacuum_analyze_scale_factor  0,5
> autovacuum_analyze_threshold  50000

We use 0.10 + 10 in production.  Defaults are now 0.10 + 50.  That's
the portion of the table plus a number of rows.  Analyze just does a
random sample from the table; it doesn't pass the whole table.

> autovacuum_vacuum_scale_factor  0,5
> autovacuum_vacuum_threshold  50000

We use 0.20 + 10 in production.  Defaults are now 0.20 + 50.  Again,
a proportion of the table (in this case what is expected to have
become unusable dead space) plus a number of unusable dead tuples.

> autovacuum_naptime  1h

A one-page table could easily bloat to hundreds (or thousands) of
pages within an hour.  You will wonder where all your CPU time is
going because it will constantly be scanning the same (cached) pages
to find the one version of the row which matters.  I recommend 1min.

> vacuum_cost_delay  1s

A vacuum run will never get much done at that rate.  I recommend
10ms.

> vacuum_cost_limit  200

We've boosted this to 600.  Once you're in a "steady state", this is
the setting you might want to adjust up or down as needed to make
cleanup aggressive enough without putting a noticeable dent in
performance while it is running.

On 8.3 I believe you still need to worry about the fsm settings.
Run your regular database vacuum with the VERBOSE option, and check
what the last few lines say.  If you don't have enough memory set
aside to track free space, no vacuum regimen will prevent bloat.

-Kevin

Re: Question about VACUUM

From
Scott Marlowe
Date:
On Mon, Dec 5, 2011 at 11:36 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Ernesto Quiñones<ernestoq@gmail.com> wrote:
>> vacuum_cost_limit  200

> We've boosted this to 600.  Once you're in a "steady state", this is
> the setting you might want to adjust up or down as needed to make
> cleanup aggressive enough without putting a noticeable dent in
> performance while it is running.

On the busy production systems I've worked on in the past, we had this
cranked up to several thousand along with 10 or so workers to keep up
on a busy machine.  The more IO your box has, the more you can afford
to make vacuum / autovacuum aggressive.

Re: Question about VACUUM

From
Josh Berkus
Date:
On 12/5/11 1:36 PM, Kevin Grittner wrote:
> I understand the impulse to run autovacuum less frequently or less
> aggressively.  When we first started running PostgreSQL the default
> configuration was very cautious.

The default settings are deliberately cautious, as default settings
should be.

But yes, anyone with a really large/high-traffic database will often
want to make autovac more aggressive.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Question about VACUUM

From
"Kevin Grittner"
Date:
Josh Berkus <josh@agliodbs.com> wrote:
> On 12/5/11 1:36 PM, Kevin Grittner wrote:
>> I understand the impulse to run autovacuum less frequently or
>> less aggressively.  When we first started running PostgreSQL the
>> default configuration was very cautious.
>
> The default settings are deliberately cautious, as default
> settings should be.

I was talking historically, about the defaults in 8.1:

http://www.postgresql.org/docs/8.1/interactive/runtime-config-autovacuum.html

Those defaults were *over*-cautious to the point that we experienced
serious problems.  My point was that many people's first instinct in
that case is to make the setting less aggressive, as I initially did
and the OP has done.  The problem is actually solved by making them
*more* aggressive.  Current defaults are pretty close to what we
found, through experimentation, worked well for us for most
databases.

> But yes, anyone with a really large/high-traffic database will
> often want to make autovac more aggressive.

I think we're in agreement: current defaults are good for a typical
environment; high-end setups still need to tune to more aggressive
settings.  This is an area where incremental changes with monitoring
works well.

-Kevin