Thread: Still recommending daily vacuum...

Still recommending daily vacuum...

From
"Jim C. Nasby"
Date:
From
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

"Recommended practice for most sites is to schedule a database-wide
VACUUM once a day at a low-usage time of day, supplemented by more
frequent vacuuming of heavily-updated tables if necessary. (Some
installations with extremely high update rates vacuum their busiest
tables as often as once every few minutes.) If you have multiple
databases in a cluster, don't forget to VACUUM each one; the program
vacuumdb  might be helpful."

Do we still want that to be our formal recommendation? ISTM it would be
more logical to recommend a combination of autovac, daily vacuumdb -a if
you can afford it and have a quiet period, and frequent manual vacuuming
of things like web session tables.

I'm happy to come up with a patch, but I figure there should be
consensus first...
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Still recommending daily vacuum...

From
Robert Treat
Date:
On Monday 02 July 2007 17:52, Jim C. Nasby wrote:
> From
> http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
>
> "Recommended practice for most sites is to schedule a database-wide
> VACUUM once a day at a low-usage time of day, supplemented by more
> frequent vacuuming of heavily-updated tables if necessary. (Some
> installations with extremely high update rates vacuum their busiest
> tables as often as once every few minutes.) If you have multiple
> databases in a cluster, don't forget to VACUUM each one; the program
> vacuumdb  might be helpful."
>
> Do we still want that to be our formal recommendation? ISTM it would be
> more logical to recommend a combination of autovac, daily vacuumdb -a if
> you can afford it and have a quiet period, and frequent manual vacuuming
> of things like web session tables.
>
> I'm happy to come up with a patch, but I figure there should be
> consensus first...

I generally recommend to try autovacuum first, augmented by 
vacuum/analyze/reindex if you find trouble.  I wont say there aren't 
workloads that autvacuum wont handle, but in most cases it does fine, and I 
expect that increase with 8.3. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: Still recommending daily vacuum...

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.

Is there a reason to say anything beyond "use autovac"?
        regards, tom lane


Re: Still recommending daily vacuum...

From
"Joshua D. Drake"
Date:
Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
>> http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
> 
> Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
> text now.
> 
> Is there a reason to say anything beyond "use autovac"?

Did we change the default autovac parameters for 8.3 (beyond turning it 
on?) because on any reasonably used database, they are way to conservative.

Joshua D. Drake

> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 


-- 
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Still recommending daily vacuum...

From
Alvaro Herrera
Date:
Joshua D. Drake wrote:
> Tom Lane wrote:
> >"Jim C. Nasby" <decibel@decibel.org> writes:
> >>http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
> >
> >Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
> >text now.
> >
> >Is there a reason to say anything beyond "use autovac"?
> 
> Did we change the default autovac parameters for 8.3 (beyond turning it 
> on?) because on any reasonably used database, they are way to conservative.

We're still on time to change them ...  Any concrete proposals?

-- 
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"When the proper man does nothing (wu-wei),
his thought is felt ten thousand miles." (Lao Tse)


Re: Still recommending daily vacuum...

From
"Joshua D. Drake"
Date:
Alvaro Herrera wrote:
> Joshua D. Drake wrote:
>> Tom Lane wrote:
>>> "Jim C. Nasby" <decibel@decibel.org> writes:
>>>> http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
>>> Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
>>> text now.
>>>
>>> Is there a reason to say anything beyond "use autovac"?
>> Did we change the default autovac parameters for 8.3 (beyond turning it 
>> on?) because on any reasonably used database, they are way to conservative.
> 
> We're still on time to change them ...  Any concrete proposals?

I could provide numbers from production high use databases. We could 
probably back those down a little and make more reasonable numbers.

Joshua D. Drake




-- 
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Still recommending daily vacuum...

From
Michael Paesold
Date:
Joshua D. Drake wrote:
> Alvaro Herrera wrote:
>> Joshua D. Drake wrote:
>>> Did we change the default autovac parameters for 8.3 (beyond turning 
>>> it on?) because on any reasonably used database, they are way to 
>>> conservative.
>>
>> We're still on time to change them ...  Any concrete proposals?
> 
> I could provide numbers from production high use databases. We could 
> probably back those down a little and make more reasonable numbers.

Please do so. Perhaps others can also tell their typical settings.

Best Regards
Michael Paesold


Re: Still recommending daily vacuum...

From
"Jim C. Nasby"
Date:
On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
> Joshua D. Drake wrote:
> >Alvaro Herrera wrote:
> >>Joshua D. Drake wrote:
> >>>Did we change the default autovac parameters for 8.3 (beyond turning
> >>>it on?) because on any reasonably used database, they are way to
> >>>conservative.
> >>
> >>We're still on time to change them ...  Any concrete proposals?
> >
> >I could provide numbers from production high use databases. We could
> >probably back those down a little and make more reasonable numbers.
>
> Please do so. Perhaps others can also tell their typical settings.

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Still recommending daily vacuum...

From
"Jim C. Nasby"
Date:
On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
>
> Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
> text now.
>
> Is there a reason to say anything beyond "use autovac"?

There is; I know that things like web session tables aren't handled very
well by autovacuum if there are any moderately large tables (anything
that will take more than a few minutes to vacuum). Eventually we should
be able to accommodate that case with multiple workers, but we'll need a
mechanism to ensure that at least one worker doesn't get tied up in
large vacuums.
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Still recommending daily vacuum...

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
>> Is there a reason to say anything beyond "use autovac"?

> There is; I know that things like web session tables aren't handled very
> well by autovacuum if there are any moderately large tables (anything
> that will take more than a few minutes to vacuum). Eventually we should
> be able to accommodate that case with multiple workers, but we'll need a
> mechanism to ensure that at least one worker doesn't get tied up in
> large vacuums.

And which part of that do you think isn't resolved in 8.3?
        regards, tom lane


Re: Still recommending daily vacuum...

From
Alvaro Herrera
Date:
Jim C. Nasby wrote:
> On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
> > Joshua D. Drake wrote:
> > >Alvaro Herrera wrote:
> > >>Joshua D. Drake wrote:
> > >>>Did we change the default autovac parameters for 8.3 (beyond turning 
> > >>>it on?) because on any reasonably used database, they are way to 
> > >>>conservative.
> > >>
> > >>We're still on time to change them ...  Any concrete proposals?
> > >
> > >I could provide numbers from production high use databases. We could 
> > >probably back those down a little and make more reasonable numbers.
> > 
> > Please do so. Perhaps others can also tell their typical settings.
> 
> FWIW, I normally go with the 8.2 defaults, though I could see dropping
> vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
> could be decreased further, maybe divide by 10.

How about pushing thresholds all the way down to 0?

-- 
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Vivir y dejar de vivir son soluciones imaginarias.
La existencia está en otra parte" (Andre Breton)


Re: Still recommending daily vacuum...

From
"Kevin Grittner"
Date:
>>> On Tue, Jul 3, 2007 at  3:36 PM, in message <13153.1183494983@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
>> On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
>>> Is there a reason to say anything beyond "use autovac"?
>
>> There is; I know that things like web session tables aren't handled very
>> well by autovacuum if there are any moderately large tables (anything
>> that will take more than a few minutes to vacuum). Eventually we should
>> be able to accommodate that case with multiple workers, but we'll need a
>> mechanism to ensure that at least one worker doesn't get tied up in
>> large vacuums.
>
> And which part of that do you think isn't resolved in 8.3?
We have a 406GB table where 304GB is in one table.  The next two tables
are 57GB and 40GB.  Inserts to these three tables are constant during the
business day, along with inserts, updates, and very few deletes to the
other tables.  Database modifications are few and scattered at night and
on weekends.  Virtually all queries are during the business day.  The
large tables are "insert only" except for a weekend delete of the oldest
one week of data, to keep a rolling set of just over a year.  (No, we
really don't want to go to weekly partitions, if it can be avoided.)
Autovacuum is enabled with very aggressive settings, to cover small
tables, including one with about 75 rows that can be updated 100 or more
times per second.  Even with these settings there is zero chance of any
table of even moderate size hitting the autovacuum threshold between our
scheduled vacuums.  When we tried doing a nightly vacuum analyze starting
at the end of business day, it ran well into the next day, and the users
complained of slowness until it stopped.  We changed to a weeknight vacuum
analyze of the volatile tables which aren't in the big three, and a vacuum
analyze of the entire database right after the weekly delete.
Isn't this a use case where we don't want to count on autovacuum, both
from a table bloat perspective and the user impact perspective, even under
8.3?
In terms of our autovacuum settings, we have several different types of
databases, and in all of them we seem to do well with these changes from
the 8.2 defaults, combined with (except for the above configuration) a
nightly database vacuum:
autovacuum_naptime = 10s
autovacuum_vacuum_threshold = 1
autovacuum_analyze_threshold = 1
Oh, the tiny, high-update tables occasionally bloat to hundreds or
thousands of pages because of long-running transactions, so we schedule
a daily cluster on those, just to keep things tidy.
-Kevin



Re: Still recommending daily vacuum...

From
"Kevin Grittner"
Date:
>>> On Tue, Jul 3, 2007 at  5:17 PM, in message
<468A84A1.EE98.0025.0@wicourts.gov>, "Kevin Grittner"
<Kevin.Grittner@wicourts.gov> wrote:
>
> We have a 406GB table where 304GB is in one table.  The next two tables

It's probably obvious, but I meant a 406GB database.  Sorry.




Re: Still recommending daily vacuum...

From
Alvaro Herrera
Date:
Kevin Grittner wrote:

> We have a 406GB table where 304GB is in one table.  The next two tables
> are 57GB and 40GB.  Inserts to these three tables are constant during the
> business day, along with inserts, updates, and very few deletes to the
> other tables.  Database modifications are few and scattered at night and
> on weekends.  Virtually all queries are during the business day.  The
> large tables are "insert only" except for a weekend delete of the oldest
> one week of data, to keep a rolling set of just over a year.  (No, we
> really don't want to go to weekly partitions, if it can be avoided.)
>  
> Autovacuum is enabled with very aggressive settings, to cover small
> tables, including one with about 75 rows that can be updated 100 or more
> times per second.  Even with these settings there is zero chance of any
> table of even moderate size hitting the autovacuum threshold between our
> scheduled vacuums.  When we tried doing a nightly vacuum analyze starting
> at the end of business day, it ran well into the next day, and the users
> complained of slowness until it stopped.  We changed to a weeknight vacuum
> analyze of the volatile tables which aren't in the big three, and a vacuum
> analyze of the entire database right after the weekly delete.

Sounds like you would be served by setting those specific tables to a
lower vacuum scale factor (keeping a more normal default for the rest of
the tables), and having a non-zero vacuum delay setting (to avoid
excessive I/O consumption).  Have you tried that?

The problem you would still have with 8.2 is that while one of these
tables is being vacuumed the rest won't be vacuumed at all.  In 8.3 the
other tables can still be vacuumed regularly with the big vacuum still
running (a feature I dubbed "multiple workers", but we're still waiting
to know what name the marketing guys are gonna use).

> In terms of our autovacuum settings, we have several different types of
> databases, and in all of them we seem to do well with these changes from
> the 8.2 defaults, combined with (except for the above configuration) a
> nightly database vacuum:
>  
> autovacuum_naptime = 10s

Another change in 8.3 is that the naptime is per-database, i.e. the time
between two consecutive autovac runs on a database.  So with a setting
of 10s, if you have 10 database there will be one autovac run per
second, whereas on 8.2 there would be one autovac each 10 seconds
(unless you run out of worker slots).


> Oh, the tiny, high-update tables occasionally bloat to hundreds or
> thousands of pages because of long-running transactions, so we schedule
> a daily cluster on those, just to keep things tidy.

If you can afford the cluster then there's no problem.  I don't expect
that to change in 8.3.

-- 
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers


Re: Still recommending daily vacuum...

From
"Matthew T. O'Connor"
Date:
Alvaro Herrera wrote:
> Jim C. Nasby wrote:
>> FWIW, I normally go with the 8.2 defaults, though I could see dropping
>> vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
>> could be decreased further, maybe divide by 10.
> 
> How about pushing thresholds all the way down to 0?


As long as it handles small (or zero row) tables ok then yes.  The 
base_threshold in the originial contrib autovacuum was just an easy way 
to not vacuum really small tables too often.  If a table has only 10 
rows, it's going to get vacuumed every time one row is updated.  I guess 
that's not a big problem with a table that small but still seems excessive.

If you think this isn't a problem with the current autovacuum, then sure 
turn it down to zero, and perhaps we can even get rid of it altogether 
in another release or two.




Re: Still recommending daily vacuum...

From
Gregory Stark
Date:
"Alvaro Herrera" <alvherre@commandprompt.com> writes:

>> FWIW, I normally go with the 8.2 defaults, though I could see dropping
>> vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
>> could be decreased further, maybe divide by 10.
>
> How about pushing thresholds all the way down to 0?

My intuition is that the thresholds should be lowered to about 5%.

I may be biased by the TPC-C schema where the largest table, stock, gets a
little over 20 records per page so 5% represents an average of one update per
page. But there's nothing unusual about a table like that. Waiting until 20%
of the table is potentially dead --four dead tuples out of 20 per page in the
stock table case-- seems extravagantly wasteful.

I find the idea of lowering the thresholds to 0 sort of intriguing though.
That makes the vacuum delay parameters the primary method to control how
frequently vacuum runs.

Unfortunately vacuum delay settings are hard to get right. The admin needs to
observe how much of an effect the settings have on i/o throughput which varies
from system to system. And using them to control how frequently vacuum runs
would be even harder.

In an ideal world autovacuum would be able to set the delay settings based on
how many updates had happened since the last run started. If more than 5% of
the table was cleaned by vacuum then decrease the delay settings to get this
vacuum to finish sooner and allow fewer updates. If less than 5% of the table
was cleaned by vacuum then increase the delay settings to reduce the
unnecessary impact of vacuum. But that just leaves us back where we started.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Still recommending daily vacuum...

From
Alvaro Herrera
Date:
Gregory Stark wrote:
> 
> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
> 
> >> FWIW, I normally go with the 8.2 defaults, though I could see dropping
> >> vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
> >> could be decreased further, maybe divide by 10.
> >
> > How about pushing thresholds all the way down to 0?
> 
> My intuition is that the thresholds should be lowered to about 5%.
> 
> I may be biased by the TPC-C schema where the largest table, stock, gets a
> little over 20 records per page so 5% represents an average of one update per
> page. But there's nothing unusual about a table like that. Waiting until 20%
> of the table is potentially dead --four dead tuples out of 20 per page in the
> stock table case-- seems extravagantly wasteful.
> 
> I find the idea of lowering the thresholds to 0 sort of intriguing though.
> That makes the vacuum delay parameters the primary method to control how
> frequently vacuum runs.

I think you are mixing thresholds with scale factors.

vacuum tuples = threshold + reltuples * scale factor

If dead tuples are more than vacuum tuples, autovac does a vacuum.

So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway.  So why not let the threshold be 0 and be done with it?

I, too, find a 0.2 scale factor a bit high.  But since I don't run any
database, I fear I would be picking numbers out of thin air.

> In an ideal world autovacuum would be able to set the delay settings based on
> how many updates had happened since the last run started. If more than 5% of
> the table was cleaned by vacuum then decrease the delay settings to get this
> vacuum to finish sooner and allow fewer updates. If less than 5% of the table
> was cleaned by vacuum then increase the delay settings to reduce the
> unnecessary impact of vacuum. But that just leaves us back where we started.

Maybe we can construct some smarts based on something like this.  The
equations we currently use are just inherited from contrib autovac,
which didn't have access to much other info.  Integrated autovac can do
much better, I think.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Still recommending daily vacuum...

From
Michael Paesold
Date:
Alvaro Herrera wrote:
> So what you are proposing above amounts to setting scale factor = 0.05.
> The threshold is unimportant -- in the case of a big table it matters
> not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
> the case of small tables, then the table will be vacuumed in almost
> every iteration if the threshold is 0, which is fine because the table
> is small anyway.  So why not let the threshold be 0 and be done with it?

For very small tables, setting a threshold of 0 could mean a vacuum 
after every single row update (or every other row). I think that is just 
burning cycles. What about a threshold of 10 or 50, to have at least 
some sanity limit? Even though the cost of vacuum of a small table is 
low, it is still not free, IMHO, no?

Best Regards
Michael Paesold



Re: Still recommending daily vacuum...

From
"Florian G. Pflug"
Date:
Michael Paesold wrote:
> Alvaro Herrera wrote:
>> So what you are proposing above amounts to setting scale factor = 0.05.
>> The threshold is unimportant -- in the case of a big table it matters
>> not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
>> the case of small tables, then the table will be vacuumed in almost
>> every iteration if the threshold is 0, which is fine because the table
>> is small anyway.  So why not let the threshold be 0 and be done with it?
> 
> For very small tables, setting a threshold of 0 could mean a vacuum 
> after every single row update (or every other row). I think that is just 
> burning cycles. What about a threshold of 10 or 50, to have at least 
> some sanity limit? Even though the cost of vacuum of a small table is 
> low, it is still not free, IMHO, no?

A bit off-topic (because probably not realistic in a 8.3 timeframe) -
but maybe the threshold should be specified in terms of "expected number of
pages to be freed", instead specifing a bias for the number of modified
rows as it is done now. Then "1" would probably be a reasonable default, because
a vacuum that won't free at least one page seems to be not really worth
the effort - it won't safe any future IO bandwith.

Just an idea I got while following this thread...

greetings, Florian Pflug


Re: Still recommending daily vacuum...

From
"Kevin Grittner"
Date:
>>> On Tue, Jul 3, 2007 at  5:34 PM, in message
<20070703223402.GA5491@alvh.no-ip.org>, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Kevin Grittner wrote:
>
>> Autovacuum is enabled with very aggressive settings, to cover small
>> tables, including one with about 75 rows that can be updated 100 or more
>> times per second.  Even with these settings there is zero chance of any
>> table of even moderate size hitting the autovacuum threshold between our
>> scheduled vacuums.
>
> Sounds like you would be served by setting those specific tables to a
> lower vacuum scale factor (keeping a more normal default for the rest of
> the tables), and having a non-zero vacuum delay setting (to avoid
> excessive I/O consumption).  Have you tried that?
I did play with that, but it doens't seem to make sense in our environment.
We have about 100 databases, most of them scattered around the state, and
any extra maintenance like that has a cost, particularly with the daily
cluster changing the oid.  Both from doing the math and from experience,
I can say that the autovacuum only affects the small, frequently updated
tables, so I could see no benefit.  Am I missing somethign?  (I can't see
where this causes any extra I/O.)
Our tables tend to fall into one of four categories, small tables with high
update rates, medium tables (millions or tens of millions of rows) with
thousands or tens of thousands of updates per day, static tables of various
sizes that are only modified as part of a software release, and big honking
tables (100s of GB) which are either insert-only or are insert with
periodic purge of old rows.  Only the first group has a chance of being
autovacuumed in normal operations.  Event he purges don't cause it to kick
in.
>> In terms of our autovacuum settings, we have several different types of
>> databases, and in all of them we seem to do well with these changes from
>> the 8.2 defaults, combined with (except for the above configuration) a
>> nightly database vacuum:
>>
>> autovacuum_naptime = 10s
>
> Another change in 8.3 is that the naptime is per-database, i.e. the time
> between two consecutive autovac runs on a database.  So with a setting
> of 10s, if you have 10 database there will be one autovac run per
> second, whereas on 8.2 there would be one autovac each 10 seconds
> (unless you run out of worker slots).
That's fine.  We actually want it every ten seconds in a production
database.  When you can have more updates per second than there are rows
in a small table, frequent vacuums are good.  As long as the table doesn't
bloat too badly, the vacuum is typically 10 to 20 milliseconds.  I'm sure
that part of it is that the table tends to remain fully cached. When these
tables were vacuumed once per minute, we ran into performance problems.
>> Oh, the tiny, high-update tables occasionally bloat to hundreds or
>> thousands of pages because of long-running transactions, so we schedule
>> a daily cluster on those, just to keep things tidy.
>
> If you can afford the cluster then there's no problem.  I don't expect
> that to change in 8.3.
Here also we're talking 10 to 20 milliseconds.  I understand that in 8.2
that leaves a chance of an error, but we seem to have dodged that bullet
so far.  Has that gotten any safer in 8.3?
-Kevin



Re: Still recommending daily vacuum...

From
Alvaro Herrera
Date:
Matthew T. O'Connor wrote:
> Alvaro Herrera wrote:
> >Jim C. Nasby wrote:
> >>FWIW, I normally go with the 8.2 defaults, though I could see dropping
> >>vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
> >>could be decreased further, maybe divide by 10.
> >
> >How about pushing thresholds all the way down to 0?
> 
> As long as it handles small (or zero row) tables ok then yes.  The 
> base_threshold in the originial contrib autovacuum was just an easy way 
> to not vacuum really small tables too often.  If a table has only 10 
> rows, it's going to get vacuumed every time one row is updated.  I guess 
> that's not a big problem with a table that small but still seems excessive.

Well, if a table has 10 rows, and we keep the current threshold of 1000
rows, then this table must have 1002 dead tuples (99% dead tuples, 1002
dead + 10 live) before being vacuumed.  This seems wasteful because
there are 500 dead tuples on it and only 10 live tuples.  So each scan
must wade through all the dead tuples.

Another small table with 100 tuples will be vacuumed on every iteration
as well, even if there are just two dead tuples.  So you are right --
maybe dropping it all the way to 0 is too much.  But a small value of 10
is reasonable?  That will make the 10 tuple table be vacuumed when there
are 10 dead tuples (50% of dead tuples), and the 100 tuple table when
there are 11 (11% of dead tuples).  It decreases quickly to the scale
factor (2%, or do we want to decrease it to 1%?)

Does this sound acceptable?

-- 
Alvaro Herrera                 http://www.amazon.com/gp/registry/DXLWNGRJD34J
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."                               (Fotis)
(http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)


Re: Still recommending daily vacuum...

From
Alvaro Herrera
Date:
Kevin Grittner wrote:
> >>> On Tue, Jul 3, 2007 at  5:34 PM, in message
> <20070703223402.GA5491@alvh.no-ip.org>, Alvaro Herrera
> <alvherre@commandprompt.com> wrote: 
> > Kevin Grittner wrote:
> > 
> >> Autovacuum is enabled with very aggressive settings, to cover small
> >> tables, including one with about 75 rows that can be updated 100 or more
> >> times per second.  Even with these settings there is zero chance of any
> >> table of even moderate size hitting the autovacuum threshold between our
> >> scheduled vacuums.
> > 
> > Sounds like you would be served by setting those specific tables to a
> > lower vacuum scale factor (keeping a more normal default for the rest of
> > the tables), and having a non-zero vacuum delay setting (to avoid
> > excessive I/O consumption).  Have you tried that?
>  
> I did play with that, but it doens't seem to make sense in our environment.
> We have about 100 databases, most of them scattered around the state, and
> any extra maintenance like that has a cost, particularly with the daily
> cluster changing the oid.  Both from doing the math and from experience,
> I can say that the autovacuum only affects the small, frequently updated
> tables, so I could see no benefit.  Am I missing somethign?  (I can't see
> where this causes any extra I/O.)

There seem to be a misunderstanding here.

1. Cluster does not change the OID.  It only changes the relfilenode.
The file on disk is named something else, but the OID used in the
database remains unchanged.  So if you insert something into
pg_autovacuum it continues to work after a CLUSTER, you don't need to
update the OID.

2. The point of autovacuum is to get rid of maintenance burden, not add
to it.  If you know which tables are small and frequently updated, then
configure those to specific settings that you've found to be optimal,
and then you don't need to worry about vacuuming them any longer.

You already know this but: autovacuum uses a formula to determine which
tables to vacuum.  The formula is based on the number of dead tuples,
the size of the table and two factors that you can configure per table
as well as globally.  If you didn't tune it to match specific tables,
most likely your biggest tables never met the formula's condition, which
is why you were seeing it affecting only the small tables (which met the
condition under the values you configured server-wide).

The extra I/O I was talking about would come from vacuuming one of your
biggest tables, which could cause the amount of I/O to swamp everything
else the server was doing at the time.  Since it never actually touched
the big tables this hasn't happened to you yet.  Do note that autovacuum
uses the vacuum_cost_delay if autovacuum_vacuum_cost_delay is set to the
default value of -1.

> Our tables tend to fall into one of four categories, small tables with high
> update rates, medium tables (millions or tens of millions of rows) with
> thousands or tens of thousands of updates per day, static tables of various
> sizes that are only modified as part of a software release, and big honking
> tables (100s of GB) which are either insert-only or are insert with
> periodic purge of old rows.  Only the first group has a chance of being
> autovacuumed in normal operations.  Event he purges don't cause it to kick
> in.

It could certainly vacuum all your tables.  But one thing to keep in
mind that as of 8.2, only one autovacuum process can be running.  So if
and when it decides to vacuum the big tables, it will be long before it
is able to go back and check the small tables.

This is fixed in 8.3.


> >> Oh, the tiny, high-update tables occasionally bloat to hundreds or
> >> thousands of pages because of long-running transactions, so we schedule
> >> a daily cluster on those, just to keep things tidy.
> > 
> > If you can afford the cluster then there's no problem.  I don't expect
> > that to change in 8.3.
>  
> Here also we're talking 10 to 20 milliseconds.  I understand that in 8.2
> that leaves a chance of an error, but we seem to have dodged that bullet
> so far.  Has that gotten any safer in 8.3?

Yes, it did, assuming I understood what error are you talking about
(cluster not leaving dead tuples possibly seen by concurrent
transactions).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Still recommending daily vacuum...

From
"Matthew T. O'Connor"
Date:
Alvaro Herrera wrote:
> Matthew T. O'Connor wrote:
> Well, if a table has 10 rows, and we keep the current threshold of 1000
> rows, then this table must have 1002 dead tuples (99% dead tuples, 1002
> dead + 10 live) before being vacuumed.  This seems wasteful because
> there are 500 dead tuples on it and only 10 live tuples.  So each scan
> must wade through all the dead tuples.
> 
> Another small table with 100 tuples will be vacuumed on every iteration
> as well, even if there are just two dead tuples.  So you are right --
> maybe dropping it all the way to 0 is too much.  But a small value of 10
> is reasonable?  That will make the 10 tuple table be vacuumed when there
> are 10 dead tuples (50% of dead tuples), and the 100 tuple table when
> there are 11 (11% of dead tuples).  It decreases quickly to the scale
> factor (2%, or do we want to decrease it to 1%?)

I think it's probably fine. I think, that the optimal number for the 
base_threhold is probably dependant on the width of the row, for a very 
narrow row where you might have many on the same page, 20 or 50 might be 
right, but for a very wide table a smaller number might be optimal, 
however I think it probably doesn't matter much anyway.

Reducing the default to 10 seems fine, and perhaps even removing it as a 
tuning knob.  I think there are too many autovacuum knobs and it 
confuses people.  Is it too late to possibly remove this GUC altogether?            


Re: Still recommending daily vacuum...

From
"Kevin Grittner"
Date:
>>> On Fri, Jul 6, 2007 at  2:19 PM, in message
<20070706191912.GC15358@alvh.no-ip.org>, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Kevin Grittner wrote:

> 2. The point of autovacuum is to get rid of maintenance burden, not add
> to it.  If you know which tables are small and frequently updated, then
> configure those to specific settings that you've found to be optimal,
> and then you don't need to worry about vacuuming them any longer.
We have 72 counties using the same schema, which has over 300 tables.
(Each has their own server, located in their county, with their data.)
Rather than trying to fine-tune autovacuum for each table in all locations,
we find it more convenient to use general settings which are aggressive
enough for the small, high-update tables, but lax enough to let the big
ones go until a nightly database vacuum.  (That time is pretty slack
anyway, so why not off-load the overhead of the vacuum to those hours?)
> If you didn't tune it to match specific tables,
> most likely your biggest tables never met the formula's condition, which
> is why you were seeing it affecting only the small tables (which met the
> condition under the values you configured server-wide).
>
> The extra I/O I was talking about would come from vacuuming one of your
> biggest tables, which could cause the amount of I/O to swamp everything
> else the server was doing at the time.
Is there something better about having the autovacuum compete with load
during the week, rather than doing a database vacuum during otherwise
idle weekend hours, immediately after the weekly delete of almost 2% of
the rows?  At the time we run the database vacuum analyze, there is
nothing else running to be swamped.
>> Our tables tend to fall into one of four categories, small tables with high
>> update rates, medium tables (millions or tens of millions of rows) with
>> thousands or tens of thousands of updates per day, static tables of various
>> sizes that are only modified as part of a software release, and big honking
>> tables (100s of GB) which are either insert-only or are insert with
>> periodic purge of old rows.  Only the first group has a chance of being
>> autovacuumed in normal operations.  Event he purges don't cause it to kick
>> in.
>
> It could certainly vacuum all your tables.
Well, sure, if we weren't doing a nightly database vacuum.  (Weekly for the
database containing the largest tables, mentioned above.)
This all started with the question about whether the documentation should
say anything about vacuum schedules other than "enable autovacuum."
My point was that I have a use case where I think that a scheduled vacuum
will be better than leaving everything to autovacuum.  I may not be the only
one, so I'm thinking the documentation should discuss where an explicit
schedule might be useful.
The changes sound good, but I don't see the point of having any vacuum
activity during the work week on the big tables in the database I was
describing.  It seems to me that it would result in at least some
performance degradation for the interactive users, and bloat the table,
since we might start inserting before the post-delete vacuum.
-Kevin



Re: Still recommending daily vacuum...

From
Alvaro Herrera
Date:
Kevin Grittner wrote:

> This all started with the question about whether the documentation should
> say anything about vacuum schedules other than "enable autovacuum."
> My point was that I have a use case where I think that a scheduled vacuum
> will be better than leaving everything to autovacuum.  I may not be the only
> one, so I'm thinking the documentation should discuss where an explicit
> schedule might be useful.

Fair enough, you are correct that this scenario may be a useful one to
describe in the docs.

There are plans to add some sort of scheduling to autovacuum so that it
choses different settings based on time/date.  It may be useful for you
as well (though of course they are 8.4 material at best ...), and we may
consider shrinking the docs (or at least reshaping them) a bit at that
time.


> The changes sound good, but I don't see the point of having any vacuum
> activity during the work week on the big tables in the database I was
> describing.

Of course.

-- 
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"A wizard is never late, Frodo Baggins, nor is he early.He arrives precisely when he means to."  (Gandalf, en LoTR
FoTR)


Re: Still recommending daily vacuum...

From
Jim Nasby
Date:
On Jul 3, 2007, at 3:36 PM, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
>> On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
>>> Is there a reason to say anything beyond "use autovac"?
>> There is; I know that things like web session tables aren't  
>> handled very
>> well by autovacuum if there are any moderately large tables (anything
>> that will take more than a few minutes to vacuum). Eventually we  
>> should
>> be able to accommodate that case with multiple workers, but we'll  
>> need a
>> mechanism to ensure that at least one worker doesn't get tied up in
>> large vacuums.
>
> And which part of that do you think isn't resolved in 8.3?

It's still possible to tie up all autovac workers in large tables,  
though of course it's now far less likely.

BTW, +1 to dropping the thresholds to a very low value. 0 might be  
pushing it, but 10 or 20 certainly doesn't sound absurd.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)