Thread: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

[doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Guillaume Cottenceau
Date:
Dear all,

After some time spent better understanding how the VACUUM process
works, what problems we had in production and how to improve our
maintenance policy[1], I've come up with a little documentation
patch - basically, I think the documentation under estimates (or
sometimes misses) the benefit of VACUUM FULL for scans, and the
needs of VACUUM FULL if the routine VACUUM hasn't been done
properly since the database was put in production. Find the patch
against snapshot attached (text not filled, to ease reading). It
might help others in my situation in the future.


Ref:
[1] http://archives.postgresql.org/pgsql-performance/2006-08/msg00419.php
    http://archives.postgresql.org/pgsql-performance/2007-05/msg00112.php

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Attachment

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Michael Stone
Date:
On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote:
>patch - basically, I think the documentation under estimates (or
>sometimes misses) the benefit of VACUUM FULL for scans, and the
>needs of VACUUM FULL if the routine VACUUM hasn't been done
>properly since the database was put in production.

It's also possible to overestimate the benefit of vacuum full, leading
to people vacuum full'ing almost constantly, then complaining about
performance due to the associated overhead. I think there have been more
people on this list whose performance problems were caused by
unnecessary full vacs than by those whose performance problems were
caused by insufficient full vacs.

Mike Stone

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Guillaume Cottenceau
Date:
Michael Stone <mstone+postgres 'at' mathom.us> writes:

> On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote:
> >patch - basically, I think the documentation under estimates (or
> >sometimes misses) the benefit of VACUUM FULL for scans, and the
> >needs of VACUUM FULL if the routine VACUUM hasn't been done
> >properly since the database was put in production.
>
> It's also possible to overestimate the benefit of vacuum full, leading
> to people vacuum full'ing almost constantly, then complaining about
> performance due to the associated overhead. I think there have been
> more people on this list whose performance problems were caused by
> unnecessary full vacs than by those whose performance problems were
> caused by insufficient full vacs.

Come on, I don't suggest to remove several bold warnings about
it, the best one being "Therefore, frequently using VACUUM FULL
can have an extremely negative effect on the performance of
concurrent database queries." My point is to add the few
additional mentions; I don't think the claims that VACUUM FULL
physically compacts the data, and might be useful in case of too
long time with infrequent VACUUM are incorrect, are they?

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
"Jim C. Nasby"
Date:
On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote:
> Michael Stone <mstone+postgres 'at' mathom.us> writes:
>
> > On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote:
> > >patch - basically, I think the documentation under estimates (or
> > >sometimes misses) the benefit of VACUUM FULL for scans, and the
> > >needs of VACUUM FULL if the routine VACUUM hasn't been done
> > >properly since the database was put in production.
> >
> > It's also possible to overestimate the benefit of vacuum full, leading
> > to people vacuum full'ing almost constantly, then complaining about
> > performance due to the associated overhead. I think there have been
> > more people on this list whose performance problems were caused by
> > unnecessary full vacs than by those whose performance problems were
> > caused by insufficient full vacs.
>
> Come on, I don't suggest to remove several bold warnings about
> it, the best one being "Therefore, frequently using VACUUM FULL
> can have an extremely negative effect on the performance of
> concurrent database queries." My point is to add the few
> additional mentions; I don't think the claims that VACUUM FULL
> physically compacts the data, and might be useful in case of too
> long time with infrequent VACUUM are incorrect, are they?

Unfortunately they are, to a degree. VACUUM FULL can create a
substantial amount of churn in the indexes, resulting in bloated
indexes. So often you have to REINDEX after you VACUUM FULL.

Long term I think we should ditch 'VACUUM FULL' altogether and create a
COMPACT command (it's very easy for users to get confused between
"vacuum all the databases in the cluster" or "vacuum the entire
database" and "VACUUM FULL").
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Guillaume Cottenceau
Date:
"Jim C. Nasby" <decibel 'at' decibel.org> writes:

> On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote:

[...]

> > Come on, I don't suggest to remove several bold warnings about
> > it, the best one being "Therefore, frequently using VACUUM FULL
> > can have an extremely negative effect on the performance of
> > concurrent database queries." My point is to add the few
> > additional mentions; I don't think the claims that VACUUM FULL
> > physically compacts the data, and might be useful in case of too
> > long time with infrequent VACUUM are incorrect, are they?
>
> Unfortunately they are, to a degree. VACUUM FULL can create a
> substantial amount of churn in the indexes, resulting in bloated
> indexes. So often you have to REINDEX after you VACUUM FULL.

Ok, VACUUM FULL does his job (it physically compacts the data and
might be useful in case of too long time with infrequent VACUUM),
but we are going to not talk about it because we often needs a
REINDEX after it? The natural conclusion would rather be to
document the fact than REINDEX is needed after VACUUM FULL, isn't
it?

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Alvaro Herrera
Date:
Guillaume Cottenceau wrote:
> "Jim C. Nasby" <decibel 'at' decibel.org> writes:
>
> > On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote:
>
> [...]
>
> > > Come on, I don't suggest to remove several bold warnings about
> > > it, the best one being "Therefore, frequently using VACUUM FULL
> > > can have an extremely negative effect on the performance of
> > > concurrent database queries." My point is to add the few
> > > additional mentions; I don't think the claims that VACUUM FULL
> > > physically compacts the data, and might be useful in case of too
> > > long time with infrequent VACUUM are incorrect, are they?
> >
> > Unfortunately they are, to a degree. VACUUM FULL can create a
> > substantial amount of churn in the indexes, resulting in bloated
> > indexes. So often you have to REINDEX after you VACUUM FULL.
>
> Ok, VACUUM FULL does his job (it physically compacts the data and
> might be useful in case of too long time with infrequent VACUUM),
> but we are going to not talk about it because we often needs a
> REINDEX after it? The natural conclusion would rather be to
> document the fact than REINDEX is needed after VACUUM FULL, isn't
> it?

Maybe, but we should also mention that CLUSTER is a likely faster
workaround.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
"Jim C. Nasby"
Date:
On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote:
> Guillaume Cottenceau wrote:
> > "Jim C. Nasby" <decibel 'at' decibel.org> writes:
> >
> > > On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote:
> >
> > [...]
> >
> > > > Come on, I don't suggest to remove several bold warnings about
> > > > it, the best one being "Therefore, frequently using VACUUM FULL
> > > > can have an extremely negative effect on the performance of
> > > > concurrent database queries." My point is to add the few
> > > > additional mentions; I don't think the claims that VACUUM FULL
> > > > physically compacts the data, and might be useful in case of too
> > > > long time with infrequent VACUUM are incorrect, are they?
> > >
> > > Unfortunately they are, to a degree. VACUUM FULL can create a
> > > substantial amount of churn in the indexes, resulting in bloated
> > > indexes. So often you have to REINDEX after you VACUUM FULL.
> >
> > Ok, VACUUM FULL does his job (it physically compacts the data and
> > might be useful in case of too long time with infrequent VACUUM),
> > but we are going to not talk about it because we often needs a
> > REINDEX after it? The natural conclusion would rather be to
> > document the fact than REINDEX is needed after VACUUM FULL, isn't
> > it?
>
> Maybe, but we should also mention that CLUSTER is a likely faster
> workaround.

What this boils down to is that there should probably be a separate
subsection that deals with "Oh noes! My tables are too big!"
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Michael Stone
Date:
On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote:
>Maybe, but we should also mention that CLUSTER is a likely faster
>workaround.

Unless, of course, you don't particularly care about the order of the
items in your table; you might end up wasting vastly more time rewriting
tables due to unnecessary clustering than for full vacuums on a table
that doesn't need it.

Mike Stone

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Chris Browne
Date:
mstone+postgres@mathom.us (Michael Stone) writes:
> On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote:
>>Maybe, but we should also mention that CLUSTER is a likely faster
>>workaround.
>
> Unless, of course, you don't particularly care about the order of
> the items in your table; you might end up wasting vastly more time
> rewriting tables due to unnecessary clustering than for full vacuums
> on a table that doesn't need it.

Actually, this is irrelevant.

If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases,
it *frequently* is, and probably will be, nearly always, soon), then
it's a faster workaround.
--
output = ("cbbrowne" "@" "linuxfinances.info")
http://cbbrowne.com/info/oses.html
"What if you slept?  And what if, in your sleep, you dreamed?
 And what if, in your dream, you went to heaven and there
 plucked a strange and beautiful flower?  And what if, when
 you awoke, you had the flower in your hand?  Ah, what then?"
    --Coleridge

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Michael Stone
Date:
On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote:
>mstone+postgres@mathom.us (Michael Stone) writes:
>> Unless, of course, you don't particularly care about the order of
>> the items in your table; you might end up wasting vastly more time
>> rewriting tables due to unnecessary clustering than for full vacuums
>> on a table that doesn't need it.
>
>Actually, this is irrelevant.

I think it's perfectly relevant.

>If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases,
>it *frequently* is, and probably will be, nearly always, soon), then
>it's a faster workaround.

Cluster reorders the table. If a table doesn't have any dead rows and
you tell someone to run cluster or vacuum full, the vaccuum basically
won't do anything and the cluster will reorder the whole table. Cluster
is great for certain access patterns, but I've been noticing this odd
tendency lately to treat it like a silver bullet.

Mike Stone

Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal

From
Alvaro Herrera
Date:
Michael Stone wrote:
> On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote:
> >mstone+postgres@mathom.us (Michael Stone) writes:
> >>Unless, of course, you don't particularly care about the order of
> >>the items in your table; you might end up wasting vastly more time
> >>rewriting tables due to unnecessary clustering than for full vacuums
> >>on a table that doesn't need it.
> >
> >Actually, this is irrelevant.
>
> I think it's perfectly relevant.
>
> >If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases,
> >it *frequently* is, and probably will be, nearly always, soon), then
> >it's a faster workaround.
>
> Cluster reorders the table. If a table doesn't have any dead rows and
> you tell someone to run cluster or vacuum full, the vaccuum basically
> won't do anything and the cluster will reorder the whole table. Cluster
> is great for certain access patterns, but I've been noticing this odd
> tendency lately to treat it like a silver bullet.

Well, it's certainly not a silver bullet; you would use VACUUM (not
full) for most of your needs, and CLUSTER for the rare other cases.  Of
course you would not pick an index at random each time, but rather keep
using the same one, which would supposedly be faster.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Michael Stone <mstone+postgres@mathom.us> writes:
> On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote:
>> If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases,
>> it *frequently* is, and probably will be, nearly always, soon), then
>> it's a faster workaround.

> Cluster reorders the table. If a table doesn't have any dead rows and
> you tell someone to run cluster or vacuum full, the vaccuum basically
> won't do anything and the cluster will reorder the whole table. Cluster
> is great for certain access patterns, but I've been noticing this odd
> tendency lately to treat it like a silver bullet.

Sure, but VACUUM FULL looks even less like a silver bullet.

There's been talk of providing an operation that uses the same
infrastructure as CLUSTER, but doesn't make any attempt to re-order the
table: just seqscan the old heap, transfer still-live tuples into a new
heap, then rebuild indexes from scratch.  This is clearly going to be a
lot faster than a VACUUM FULL under conditions in which the latter would
have to move most of the tuples.  Heikki just fixed one of the major
objections to it (ie, CLUSTER not being MVCC-safe).  The other objection
is that peak transient disk space usage could be much higher than VACUUM
FULL's, but still for a lot of scenarios this'd be better.

            regards, tom lane