Thread: Default autovacuum settings too conservative

Default autovacuum settings too conservative

From
"Jim C. Nasby"
Date:
As I recall, the idea behind vacuum_threshold was to prevent
too-frequent vacuuming of small tables. I'm beginning to question this
reasoning:

Small tables vacuum very, very quickly, so 'extra' vacuuming is very
unlikely to hurt system performance.

Small tables are most likely to have either very few updates (ie: a
'lookup table') or very frequent updates (ie: a table implementing a
queue). In the former, even with vacuum_threshold = 0 vacuum will be a
very rare occurance. In the later case, a high threshold is likely to
cause a large amount of un-nececcasry bloat.

Also, vacuum_scale_factor of 0.4 seems unreasonably large. It means
tables will be 40% dead space, which seems excessively wasteful.
Something between 0.1 and 0.2 seems much better.

Has anyone looked at how effective these two settings are?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Default autovacuum settings too conservative

From
"Matthew T. O'Connor"
Date:
Jim C. Nasby wrote:
> Small tables are most likely to have either very few updates (ie: a
> 'lookup table') or very frequent updates (ie: a table implementing a
> queue). In the former, even with vacuum_threshold = 0 vacuum will be a
> very rare occurance. In the later case, a high threshold is likely to
> cause a large amount of un-nececcasry bloat.

Well a threshold of 0 won't work because then a 0 tuple table will get
vacuumed every time.  Or at least autovacuum needs to special case this.

> Also, vacuum_scale_factor of 0.4 seems unreasonably large. It means
> tables will be 40% dead space, which seems excessively wasteful.
> Something between 0.1 and 0.2 seems much better.

Depends on the app and the usage patterns as to what too much slack
space is.

> Has anyone looked at how effective these two settings are?

As far I as I know, we are still looking for real world feedback.  8.1
is the first release to have the integrated autovacuum.  The thresholds
in 8.1 are a good bit less conservative than the thresholds in the
contrib version.  The contrib thresholds were universally considered WAY
to conservative, but that was somewhat necessary since you couldn't set
them on a per table basis as you can in 8.1.  If we continue to hear
from people that the current 8.1 default thresholds are still to
conservative we can look into lowering them.

I think the default settings should be designed to minimize the impact
autovacuum has on the system while preventing the system from ever
getting wildly bloated (also protect xid wraparound, but that doesn't
have anything to do with the thresholds).

Matt

Re: Default autovacuum settings too conservative

From
Vivek Khera
Date:
On Feb 1, 2006, at 4:37 PM, Matthew T. O'Connor wrote:

> As far I as I know, we are still looking for real world feedback.
> 8.1 is the first release to have the integrated autovacuum.  The
> thresholds in 8.1 are a good bit less conservative than the
> thresholds in the contrib version.  The contrib thresholds were
> universally considered WAY to conservative, but that was somewhat
> necessary since you couldn't set them on a per table basis as you
> can in 8.1.  If we continue to hear from people that the current
> 8.1 default thresholds are still to conservative we can look into
> lowering them.

I spent the weekend researching and pondering this topic as well.

For me the per-table tuning is vital, since I have some tables that
are very small and implement a queue (ie, update very often several
million times per day and have at most 10 or so rows), some that are
fairly stable with O(10k) rows which update occasionally, and a
couple of tables that are  quite large: 20 million rows which updates
a few million times per day and inserts a few thousand, and another
table with ~275 million rows in which we insert and update roughly 3
million per day.

The 40% overhead would kill these large tables both in terms of
performance and disk usage. I'm pondering a global 10% and having the
big tables at or below 1% based on the rate of change.

Is there a way to make the autovacuum process log more verbosely
while leaving the rest of the logging minimal?  This would help tune it.


Re: Default autovacuum settings too conservative

From
Chris Browne
Date:
matthew@zeut.net ("Matthew T. O'Connor") writes:
> I think the default settings should be designed to minimize the
> impact autovacuum has on the system while preventing the system from
> ever getting wildly bloated (also protect xid wraparound, but that
> doesn't have anything to do with the thresholds).

That would suggest setting the "base threshold"
autovacuum_vacuum_threshold relatively low, and the "scale factor"
autovacuum_vacuum_scale_factor fairly high.
--
let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/nonrdbms.html
I think  it may be  possible to simplify  and condense the  content of
this thread somewhat:
 "GX is an ex-API.  It is no longer supported" - The Rest of Us
 "No it isn't.  It's just pining for the fjords!" - Lawson
-- Michael Paquette

Re: Default autovacuum settings too conservative

From
"Jim C. Nasby"
Date:
On Wed, Feb 01, 2006 at 04:37:07PM -0500, Matthew T. O'Connor wrote:
> I think the default settings should be designed to minimize the impact
> autovacuum has on the system while preventing the system from ever
> getting wildly bloated (also protect xid wraparound, but that doesn't
> have anything to do with the thresholds).

I don't really see the logic behind that. Problems caused by inadequate
vacuuming seem to be much more prevalent than problems caused by vacuum
impacting the system. If vacuum impact is a concern I think it more
reasonable to make the default vacuum_cost_delay non-zero instead.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Default autovacuum settings too conservative

From
Christopher Browne
Date:
> On Wed, Feb 01, 2006 at 04:37:07PM -0500, Matthew T. O'Connor wrote:
>> I think the default settings should be designed to minimize the impact
>> autovacuum has on the system while preventing the system from ever
>> getting wildly bloated (also protect xid wraparound, but that doesn't
>> have anything to do with the thresholds).
>
> I don't really see the logic behind that. Problems caused by inadequate
> vacuuming seem to be much more prevalent than problems caused by vacuum
> impacting the system. If vacuum impact is a concern I think it more
> reasonable to make the default vacuum_cost_delay non-zero instead.

That's a good point.

I would not be keen, on the other hand, on having the delays terribly
high.

Big tables, if delayed significantly, will take plenty longer to
vacuum, and I always get paranoid about long running transactions :-).
--
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/postgresql.html
This login session:  $13.99

Re: Default autovacuum settings too conservative

From
"Jim C. Nasby"
Date:
On Mon, Feb 06, 2006 at 10:14:53PM -0800, Christopher Browne wrote:
> > On Wed, Feb 01, 2006 at 04:37:07PM -0500, Matthew T. O'Connor wrote:
> >> I think the default settings should be designed to minimize the impact
> >> autovacuum has on the system while preventing the system from ever
> >> getting wildly bloated (also protect xid wraparound, but that doesn't
> >> have anything to do with the thresholds).
> >
> > I don't really see the logic behind that. Problems caused by inadequate
> > vacuuming seem to be much more prevalent than problems caused by vacuum
> > impacting the system. If vacuum impact is a concern I think it more
> > reasonable to make the default vacuum_cost_delay non-zero instead.
>
> That's a good point.
>
> I would not be keen, on the other hand, on having the delays terribly
> high.
>
> Big tables, if delayed significantly, will take plenty longer to
> vacuum, and I always get paranoid about long running transactions :-).

Very true, but I'd hope anyone running a table large enough for this to
make a difference would have done some tuning of their own...

What we really need is a replacement for vacuum_delay that takes
PostgreSQL generated IO activity into account...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Default autovacuum settings too conservative

From
Markus Schaber
Date:
Hi, Jim,

Jim C. Nasby wrote:

> What we really need is a replacement for vacuum_delay that takes
> PostgreSQL generated IO activity into account...

There are also other ideas which can make vacuum less painfull:

- Use a "delete"-map (like the free space map) so vacuum can quickly
find the pages to look at.

- Have vacuum end its transaction after a certain amount of work, and
restart at the same page later.

- Have vacuum full search good candidates with non-stopping lock (and
usage of delete-map and fsm), then doing {lock, recheck, move, unlock}
in small amounts of data with delay between.

- Introducing some load measurement, and a pressure measurement (number
of deleted rows, TID wraparound etc.). Then start vacuum when load is
low or pressure is very high. Tune other parameters (like "certain
amount of work" depending on those measures.

All of them are a lot of code to hack, but although I'm not a postgresql
core developer, I am keen enough to invite you to send patches. :-)

Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Default autovacuum settings too conservative

From
Michael Stone
Date:
On Mon, Feb 06, 2006 at 11:05:45PM -0600, Jim C. Nasby wrote:
>I don't really see the logic behind that. Problems caused by inadequate
>vacuuming seem to be much more prevalent than problems caused by vacuum
>impacting the system.

Agreed. If your tables are large enough that a vacuum matters, you
probably shouldn't be blindly running autovacuum anyway.

Mike Stone

Re: Default autovacuum settings too conservative

From
"Jim C. Nasby"
Date:
On Tue, Feb 07, 2006 at 01:39:34PM +0100, Markus Schaber wrote:
> Hi, Jim,
>
> Jim C. Nasby wrote:
>
> > What we really need is a replacement for vacuum_delay that takes
> > PostgreSQL generated IO activity into account...
>
> There are also other ideas which can make vacuum less painfull:
>
> - Use a "delete"-map (like the free space map) so vacuum can quickly
> find the pages to look at.

Already on TODO.

> - Have vacuum end its transaction after a certain amount of work, and
> restart at the same page later.

AFAIK this isn't possible with the current way vacuum works.

> - Have vacuum full search good candidates with non-stopping lock (and
> usage of delete-map and fsm), then doing {lock, recheck, move, unlock}
> in small amounts of data with delay between.

This isn't an issue of locks, it's an issue of long-running
transactions. It *might* be possible for vacuum to break work into
smaller transactions, but I'm pretty sure that would be a non-trivial
amount of hacking.

> - Introducing some load measurement, and a pressure measurement (number
> of deleted rows, TID wraparound etc.). Then start vacuum when load is
> low or pressure is very high. Tune other parameters (like "certain
> amount of work" depending on those measures.

Which is essentially what I was suggesting...

> All of them are a lot of code to hack, but although I'm not a postgresql
> core developer, I am keen enough to invite you to send patches. :-)

Well, if you know C then you're already 1 step closer to being able to
change these kinds of things than I am.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Default autovacuum settings too conservative

From
Russell Smith
Date:
Jim C. Nasby wrote:
> On Tue, Feb 07, 2006 at 01:39:34PM +0100, Markus Schaber wrote:
>
>>Hi, Jim,
>>
>>Jim C. Nasby wrote:
>>
>>
>>>What we really need is a replacement for vacuum_delay that takes
>>>PostgreSQL generated IO activity into account...
>>
>>There are also other ideas which can make vacuum less painfull:
>>
>>- Use a "delete"-map (like the free space map) so vacuum can quickly
>>find the pages to look at.
>
>
> Already on TODO.
>
>
>>- Have vacuum end its transaction after a certain amount of work, and
>>restart at the same page later.
>
>
> AFAIK this isn't possible with the current way vacuum works.

There was a patch posted for this in the 8.0 cycle, but it was said to
be not useful.  I think it's possibly useful for large tables and with
autovac only.

>
>
>>- Have vacuum full search good candidates with non-stopping lock (and
>>usage of delete-map and fsm), then doing {lock, recheck, move, unlock}
>>in small amounts of data with delay between.
>
>
> This isn't an issue of locks, it's an issue of long-running
> transactions. It *might* be possible for vacuum to break work into
> smaller transactions, but I'm pretty sure that would be a non-trivial
> amount of hacking.

When tables are tracked individually for wraparound, the longest
transaction required for vacuuming will be one to vacuum one table.
With delete-map and other functions, the time for that transaction may
be reduced.  Partial vacuum of large tables is an option, but again
requires some real smarts in the autovac code to track wraparound issues.

>
>
>>- Introducing some load measurement, and a pressure measurement (number
>>of deleted rows, TID wraparound etc.). Then start vacuum when load is
>>low or pressure is very high. Tune other parameters (like "certain
>>amount of work" depending on those measures.
>
>
> Which is essentially what I was suggesting...
>
>
>>All of them are a lot of code to hack, but although I'm not a postgresql
>>core developer, I am keen enough to invite you to send patches. :-)
>
>
> Well, if you know C then you're already 1 step closer to being able to
> change these kinds of things than I am.

Regards

Russell Smith

Re: Default autovacuum settings too conservative

From
Christopher Browne
Date:
> Jim C. Nasby wrote:
>> On Tue, Feb 07, 2006 at 01:39:34PM +0100, Markus Schaber wrote:
>>
>>>Hi, Jim,
>>>
>>>Jim C. Nasby wrote:
>>>
>>>
>>>>What we really need is a replacement for vacuum_delay that takes
>>>>PostgreSQL generated IO activity into account...
>>>
>>>There are also other ideas which can make vacuum less painfull:
>>>
>>>- Use a "delete"-map (like the free space map) so vacuum can quickly
>>>find the pages to look at.
>> Already on TODO.
>>
>>>- Have vacuum end its transaction after a certain amount of work, and
>>>restart at the same page later.
>> AFAIK this isn't possible with the current way vacuum works.
>
> There was a patch posted for this in the 8.0 cycle, but it was said to
> be not useful.  I think it's possibly useful for large tables and with
> autovac only.

I could see it being useful in an autovac perspective.  Work on a
table for a while, giving up after some period of time, but without
giving up on having done some work.

>>>- Have vacuum full search good candidates with non-stopping lock (and
>>>usage of delete-map and fsm), then doing {lock, recheck, move, unlock}
>>>in small amounts of data with delay between.
>> This isn't an issue of locks, it's an issue of long-running
>> transactions. It *might* be possible for vacuum to break work into
>> smaller transactions, but I'm pretty sure that would be a non-trivial
>> amount of hacking.

Right.  And part of the trouble is that you lose certainty that you
have covered off transaction wraparound.

> When tables are tracked individually for wraparound, the longest
> transaction required for vacuuming will be one to vacuum one
> table. With delete-map and other functions, the time for that
> transaction may be reduced.  Partial vacuum of large tables is an
> option, but again requires some real smarts in the autovac code to
> track wraparound issues.

Unfortunately, "delete-map" *doesn't* help you with the wraparound
problem.  The point of the "delete map" or "vacuum space map" is to
allow the VACUUM to only touch the pages known to need vacuuming.

At some point, you still need to walk through the whole table (touched
parts and untouched) in order to make sure that the old tuples are
frozen.

Tracking tables individually does indeed help by making the longest
transaction be the one needed for the largest table.  Unfortunately,
that one can't lean on the "delete map"/"vacuum space map" to ignore
parts of the table :-(.
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://linuxdatabases.info/info/slony.html
"Access to a COFF symbol table via ldtbread is even less abstract,
 really sucks in general, and should be banned from earth."
        -- SCSH 0.5.1 unix.c

Re: Default autovacuum settings too conservative

From
Markus Schaber
Date:
Hi, Christopher,

Christopher Browne wrote:

> Right.  And part of the trouble is that you lose certainty that you
> have covered off transaction wraparound.

Yes. Vacuum (full) serve at least four purposes:

- TID wraparound prevention
- obsolete row removal
- table compaction
- giving space back to the OS by truncating files

While the first one needs full table sweeps, the others don't. And from
my personal experience, at least the obsolete row removal is needed much
more frequently than TID wraparound prevention.

>>When tables are tracked individually for wraparound, the longest
>>transaction required for vacuuming will be one to vacuum one
>>table. With delete-map and other functions, the time for that
>>transaction may be reduced.  Partial vacuum of large tables is an
>>option, but again requires some real smarts in the autovac code to
>>track wraparound issues.
>
> Unfortunately, "delete-map" *doesn't* help you with the wraparound
> problem.  The point of the "delete map" or "vacuum space map" is to
> allow the VACUUM to only touch the pages known to need vacuuming.
>
> At some point, you still need to walk through the whole table (touched
> parts and untouched) in order to make sure that the old tuples are
> frozen.

Preventing transaction ID wraparound needs a guaranteed full table sweep
during a vacuum run, but not necessarily in a single transaction. It
should be possible to divide this full table sweep into smaller chunks,
each of them in its own transaction.

It will certainly be necessary to block e. G. simultaneous VACUUMs,
CLUSTERs or other maintainance commands for the whole VACUUM run, but
normal SELECT, INSERT and UPDATE statement should be able to interleave
with the VACUUM transaction.

Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Default autovacuum settings too conservative

From
Markus Schaber
Date:
Hi, Mahesh,

Mahesh Shinde wrote:

> Does vacuum improves the performance of the database search.. As if now I
> have a table who is having a records 70 lac and daily appx 10-15 thousand
> rows get added. so please let me know which type of vacuum I should prefer.
> I am accessing a data using  java application which is hosted on the same
> database server.

I don't know what "70 lac" means.

But if you only add to the table, and never update or delete, vacuum
brings nothing for performance. (Although it is necessary for TID
wraparound prevention.)

However, if your often do range queries on an index that does not
correspond to the insertion order, you may benefit from CLUSTERing on
that index from time to time.



Hth,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Default autovacuum settings too conservative

From
Tim Allen
Date:
Markus Schaber wrote:
>>Does vacuum improves the performance of the database search.. As if now I
>>have a table who is having a records 70 lac and daily appx 10-15 thousand
>>rows get added. so please let me know which type of vacuum I should prefer.
>>I am accessing a data using  java application which is hosted on the same
>>database server.
>
> I don't know what "70 lac" means.

One lac (also spelt "lakh") is one hundred thousand. And one crore is
ten million. Indians count differently from the rest of the world :-).

Tim

--
-----------------------------------------------
Tim Allen          tim@proximity.com.au
Proximity Pty Ltd  http://www.proximity.com.au/

Re: Default autovacuum settings too conservative

From
Markus Schaber
Date:
Hi, Tim,

Tim Allen schrieb:
>> I don't know what "70 lac" means.
> One lac (also spelt "lakh") is one hundred thousand. And one crore is
> ten million. Indians count differently from the rest of the world :-).

Okay, so he talks about 7 million rows.

Thank you.

Markus