Thread: how to plan for vacuum?

how to plan for vacuum?

From
Galy Lee
Date:
Hi,

For I can not find too much information about how to use vacuum, I want
to ask some general information about the guideline of vacuum planning.

1.  How do we know if autovacuum is enough for my application, or should
    I setup a vacuum manually from cron for my application?

2. How to set the GUC parameters for autovacuum?
There are two sets of parameters for autovacuum:
    - vacuum threshold and scale factor (500/0.2)
   - analyze threshold and scale factor(250/0.1)
Is there any guideline to set these parameters?  When does it need to
change the default values?
 
3. How to tune cost-based delay vacuum?
I had searched in performance list; it seems that most of the practices
are based on experience / trial-and-error approach to meet the
requirement of disk utilization or CPU utilization. Is there any other
guild line to set them?

For when autovacuum is turned on by default, if the parameters for
vacuum have not been set well, it will make the system rather unstable.
So I just wonder if we should setup a section in the manual about the
tips of vacuum, then many users can easily set the vacuum parameters for
their system.

Best Regards
Galy Lee
NTT OSS Center

Re: how to plan for vacuum?

From
Galy Lee
Date:
Just have one example here:

workload:      run pgbench in 365x24x7
database size: 100GB

the workload distribution:
  06:00-24:00   100tps
  00:00-06:00   20tps

how should we plan vacuum for this situation to get the highest performance?

Best regards
Galy

Galy Lee wrote:
> Hi,
>
> For I can not find too much information about how to use vacuum, I want
> to ask some general information about the guideline of vacuum planning.
>
> 1.  How do we know if autovacuum is enough for my application, or should
>     I setup a vacuum manually from cron for my application?
>
> 2. How to set the GUC parameters for autovacuum?
> There are two sets of parameters for autovacuum:
>     - vacuum threshold and scale factor (500/0.2)
>    - analyze threshold and scale factor(250/0.1)
> Is there any guideline to set these parameters?  When does it need to
> change the default values?
>  
> 3. How to tune cost-based delay vacuum?
> I had searched in performance list; it seems that most of the practices
> are based on experience / trial-and-error approach to meet the
> requirement of disk utilization or CPU utilization. Is there any other
> guild line to set them?
>
> For when autovacuum is turned on by default, if the parameters for
> vacuum have not been set well, it will make the system rather unstable.
> So I just wonder if we should setup a section in the manual about the
> tips of vacuum, then many users can easily set the vacuum parameters for
> their system.

Re: how to plan for vacuum?

From
"Jim C. Nasby"
Date:
On Wed, Jan 24, 2007 at 02:37:44PM +0900, Galy Lee wrote:
> 1.  How do we know if autovacuum is enough for my application, or should
>     I setup a vacuum manually from cron for my application?

Generally I trust autovac unless there's some tables where it's critical
that they be vacuumed frequently, such as a queue table or a web session
table.

> 2. How to set the GUC parameters for autovacuum?
> There are two sets of parameters for autovacuum:
>     - vacuum threshold and scale factor (500/0.2)
> ?$B!!  - analyze threshold and scale factor(250/0.1)
> Is there any guideline to set these parameters?  When does it need to
> change the default values?

I find those are generally pretty good starting points; just bear in
mind that it means 20% dead space.

> 3. How to tune cost-based delay vacuum?
> I had searched in performance list; it seems that most of the practices
> are based on experience / trial-and-error approach to meet the
> requirement of disk utilization or CPU utilization. Is there any other
> guild line to set them?

Unless you have a means for the database to monitor IO usage on it's
own, I don't know that we have a choice...

I'll generally start with a cost delay of 20ms and adjust based on IO
utilization.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: how to plan for vacuum?

From
Alvaro Herrera
Date:
Jim C. Nasby wrote:

> I'll generally start with a cost delay of 20ms and adjust based on IO
> utilization.

I've been considering set a default autovacuum cost delay to 10ms; does
this sound reasonable?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [HACKERS] how to plan for vacuum?

From
"Jim C. Nasby"
Date:
On Thu, Jan 25, 2007 at 12:52:02AM -0300, Alvaro Herrera wrote:
> Jim C. Nasby wrote:
>
> > I'll generally start with a cost delay of 20ms and adjust based on IO
> > utilization.
>
> I've been considering set a default autovacuum cost delay to 10ms; does
> this sound reasonable?

For a lightly loaded system, sure. For a heavier load that might be too
much, but of course that's very dependent on not only your hardware, but
how much you want vacuum to interfere with normal operations. Though,
I'd say as a default it's probably better to be more aggressive rather
than less.

Also, it might be better to only set autovac_cost_delay by default;
presumably if someone's running vacuum by hand they want it done pronto.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: how to plan for vacuum?

From
Galy Lee
Date:

Jim C. Nasby wrote:
> On Wed, Jan 24, 2007 at 02:37:44PM +0900, Galy Lee wrote:
>> 1.  How do we know if autovacuum is enough for my application, or should
>>     I setup a vacuum manually from cron for my application?
>
> Generally I trust autovac unless there's some tables where it's critical
> that they be vacuumed frequently, such as a queue table or a web session
> table.

So how much can we trust autovac? I think at least the following cases
can not be covered by autovac now:
  - small but high update tables which are sensitive to garbage
  - very big tables which need a long time to be vacuumed.
  - when we need to adjust the the max_fsm_page

>> 2. How to set the GUC parameters for autovacuum?
>> There are two sets of parameters for autovacuum:
>>     - vacuum threshold and scale factor (500/0.2)
>> ?$B!!  - analyze threshold and scale factor(250/0.1)
>> Is there any guideline to set these parameters?  When does it need to
>> change the default values?
>
> I find those are generally pretty good starting points; just bear in
> mind that it means 20% dead space.

so what is the principle to set them?
  - keep dead space lower than some disk limit
  - or keep the garbage rate lower than fillfactor
  or any other general principle?


Re: how to plan for vacuum?

From
Ray Stell
Date:
On Thu, Jan 25, 2007 at 07:29:20PM +0900, Galy Lee wrote:
> so what is the principle to set them?
>  - keep dead space lower than some disk limit
>  - or keep the garbage rate lower than fillfactor
>  or any other general principle?


How do you measure "dead space" and "garbage rate?"

I'm a newbe, I don't even know what these terms mean, but if I can measure
them, perhaps it will gel, and really if you can't measure the effect
of a setting change, what have you got?  I would hope any discussion on
autovac parms would include some metric evaluation techniques.  Thanks.

Re: how to plan for vacuum?

From
"Jim C. Nasby"
Date:
Please cc the list so others can reply as well...

On Thu, Jan 25, 2007 at 08:45:50AM +0100, Tomas Vondra wrote:
> > On Wed, Jan 24, 2007 at 02:37:44PM +0900, Galy Lee wrote:
> >> 1.  How do we know if autovacuum is enough for my application, or should
> >>     I setup a vacuum manually from cron for my application?
> >
> > Generally I trust autovac unless there's some tables where it's critical
> > that they be vacuumed frequently, such as a queue table or a web session
> > table.
>
> You can tune thresholds and scale factors for that particular table
> using pg_autovacuum. If you lower them appropriately, the vacuum will be
> fired more often for that table - but don't lower them too much, just go
> step by step until you reach values that are fine for you.

That doesn't work well if autovac gets tied up vacuuming a very large
table. Granted, when that happens there are considerations about the
long-running vacuum transaction (prior to 8.2), but in many systems
you'll still get some use out of other vacuums.
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: [HACKERS] how to plan for vacuum?

From
"Joshua D. Drake"
Date:
Alvaro Herrera wrote:
> Jim C. Nasby wrote:
>
>> I'll generally start with a cost delay of 20ms and adjust based on IO
>> utilization.
>
> I've been considering set a default autovacuum cost delay to 10ms; does
> this sound reasonable?

It really depends on the system. Most of our systems run anywhere from
10-25ms. I find that any more than that, Vacuum takes too long.

Joshua D. Drake



--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: [HACKERS] how to plan for vacuum?

From
Ray Stell
Date:
On Thu, Jan 25, 2007 at 08:04:49AM -0800, Joshua D. Drake wrote:
>
> It really depends on the system. Most of our systems run anywhere from
> 10-25ms. I find that any more than that, Vacuum takes too long.


How do you measure the impact of setting it to 12 as opposed to 15?

Re: [HACKERS] how to plan for vacuum?

From
Jim Nasby
Date:
On Jan 25, 2007, at 10:33 AM, Ray Stell wrote:
> On Thu, Jan 25, 2007 at 08:04:49AM -0800, Joshua D. Drake wrote:
>>
>> It really depends on the system. Most of our systems run anywhere
>> from
>> 10-25ms. I find that any more than that, Vacuum takes too long.
>
>
> How do you measure the impact of setting it to 12 as opposed to 15?

If you've got a tool that will report disk utilization as a
percentage it's very easy; I'll decrease the setting until I'm at
about 90% utilization with the system's normal workload (leaving some
room for spikes, etc). Sometimes I'll also tune the costs if reads
vs. writes are a concern.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)