Thread: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal
Dear all, After some time spent better understanding how the VACUUM process works, what problems we had in production and how to improve our maintenance policy[1], I've come up with a little documentation patch - basically, I think the documentation under estimates (or sometimes misses) the benefit of VACUUM FULL for scans, and the needs of VACUUM FULL if the routine VACUUM hasn't been done properly since the database was put in production. Find the patch against snapshot attached (text not filled, to ease reading). It might help others in my situation in the future. Ref: [1] http://archives.postgresql.org/pgsql-performance/2006-08/msg00419.php http://archives.postgresql.org/pgsql-performance/2007-05/msg00112.php -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36
Attachment
On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote: >patch - basically, I think the documentation under estimates (or >sometimes misses) the benefit of VACUUM FULL for scans, and the >needs of VACUUM FULL if the routine VACUUM hasn't been done >properly since the database was put in production. It's also possible to overestimate the benefit of vacuum full, leading to people vacuum full'ing almost constantly, then complaining about performance due to the associated overhead. I think there have been more people on this list whose performance problems were caused by unnecessary full vacs than by those whose performance problems were caused by insufficient full vacs. Mike Stone
Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal
From
Guillaume Cottenceau
Date:
Michael Stone <mstone+postgres 'at' mathom.us> writes: > On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote: > >patch - basically, I think the documentation under estimates (or > >sometimes misses) the benefit of VACUUM FULL for scans, and the > >needs of VACUUM FULL if the routine VACUUM hasn't been done > >properly since the database was put in production. > > It's also possible to overestimate the benefit of vacuum full, leading > to people vacuum full'ing almost constantly, then complaining about > performance due to the associated overhead. I think there have been > more people on this list whose performance problems were caused by > unnecessary full vacs than by those whose performance problems were > caused by insufficient full vacs. Come on, I don't suggest to remove several bold warnings about it, the best one being "Therefore, frequently using VACUUM FULL can have an extremely negative effect on the performance of concurrent database queries." My point is to add the few additional mentions; I don't think the claims that VACUUM FULL physically compacts the data, and might be useful in case of too long time with infrequent VACUUM are incorrect, are they? -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36
On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote: > Michael Stone <mstone+postgres 'at' mathom.us> writes: > > > On Tue, May 15, 2007 at 06:43:50PM +0200, Guillaume Cottenceau wrote: > > >patch - basically, I think the documentation under estimates (or > > >sometimes misses) the benefit of VACUUM FULL for scans, and the > > >needs of VACUUM FULL if the routine VACUUM hasn't been done > > >properly since the database was put in production. > > > > It's also possible to overestimate the benefit of vacuum full, leading > > to people vacuum full'ing almost constantly, then complaining about > > performance due to the associated overhead. I think there have been > > more people on this list whose performance problems were caused by > > unnecessary full vacs than by those whose performance problems were > > caused by insufficient full vacs. > > Come on, I don't suggest to remove several bold warnings about > it, the best one being "Therefore, frequently using VACUUM FULL > can have an extremely negative effect on the performance of > concurrent database queries." My point is to add the few > additional mentions; I don't think the claims that VACUUM FULL > physically compacts the data, and might be useful in case of too > long time with infrequent VACUUM are incorrect, are they? Unfortunately they are, to a degree. VACUUM FULL can create a substantial amount of churn in the indexes, resulting in bloated indexes. So often you have to REINDEX after you VACUUM FULL. Long term I think we should ditch 'VACUUM FULL' altogether and create a COMPACT command (it's very easy for users to get confused between "vacuum all the databases in the cluster" or "vacuum the entire database" and "VACUUM FULL"). -- Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Re: [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal
From
Guillaume Cottenceau
Date:
"Jim C. Nasby" <decibel 'at' decibel.org> writes: > On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote: [...] > > Come on, I don't suggest to remove several bold warnings about > > it, the best one being "Therefore, frequently using VACUUM FULL > > can have an extremely negative effect on the performance of > > concurrent database queries." My point is to add the few > > additional mentions; I don't think the claims that VACUUM FULL > > physically compacts the data, and might be useful in case of too > > long time with infrequent VACUUM are incorrect, are they? > > Unfortunately they are, to a degree. VACUUM FULL can create a > substantial amount of churn in the indexes, resulting in bloated > indexes. So often you have to REINDEX after you VACUUM FULL. Ok, VACUUM FULL does his job (it physically compacts the data and might be useful in case of too long time with infrequent VACUUM), but we are going to not talk about it because we often needs a REINDEX after it? The natural conclusion would rather be to document the fact than REINDEX is needed after VACUUM FULL, isn't it? -- Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36
Guillaume Cottenceau wrote: > "Jim C. Nasby" <decibel 'at' decibel.org> writes: > > > On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote: > > [...] > > > > Come on, I don't suggest to remove several bold warnings about > > > it, the best one being "Therefore, frequently using VACUUM FULL > > > can have an extremely negative effect on the performance of > > > concurrent database queries." My point is to add the few > > > additional mentions; I don't think the claims that VACUUM FULL > > > physically compacts the data, and might be useful in case of too > > > long time with infrequent VACUUM are incorrect, are they? > > > > Unfortunately they are, to a degree. VACUUM FULL can create a > > substantial amount of churn in the indexes, resulting in bloated > > indexes. So often you have to REINDEX after you VACUUM FULL. > > Ok, VACUUM FULL does his job (it physically compacts the data and > might be useful in case of too long time with infrequent VACUUM), > but we are going to not talk about it because we often needs a > REINDEX after it? The natural conclusion would rather be to > document the fact than REINDEX is needed after VACUUM FULL, isn't > it? Maybe, but we should also mention that CLUSTER is a likely faster workaround. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote: > Guillaume Cottenceau wrote: > > "Jim C. Nasby" <decibel 'at' decibel.org> writes: > > > > > On Wed, May 16, 2007 at 09:41:46AM +0200, Guillaume Cottenceau wrote: > > > > [...] > > > > > > Come on, I don't suggest to remove several bold warnings about > > > > it, the best one being "Therefore, frequently using VACUUM FULL > > > > can have an extremely negative effect on the performance of > > > > concurrent database queries." My point is to add the few > > > > additional mentions; I don't think the claims that VACUUM FULL > > > > physically compacts the data, and might be useful in case of too > > > > long time with infrequent VACUUM are incorrect, are they? > > > > > > Unfortunately they are, to a degree. VACUUM FULL can create a > > > substantial amount of churn in the indexes, resulting in bloated > > > indexes. So often you have to REINDEX after you VACUUM FULL. > > > > Ok, VACUUM FULL does his job (it physically compacts the data and > > might be useful in case of too long time with infrequent VACUUM), > > but we are going to not talk about it because we often needs a > > REINDEX after it? The natural conclusion would rather be to > > document the fact than REINDEX is needed after VACUUM FULL, isn't > > it? > > Maybe, but we should also mention that CLUSTER is a likely faster > workaround. What this boils down to is that there should probably be a separate subsection that deals with "Oh noes! My tables are too big!" -- Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote: >Maybe, but we should also mention that CLUSTER is a likely faster >workaround. Unless, of course, you don't particularly care about the order of the items in your table; you might end up wasting vastly more time rewriting tables due to unnecessary clustering than for full vacuums on a table that doesn't need it. Mike Stone
mstone+postgres@mathom.us (Michael Stone) writes: > On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote: >>Maybe, but we should also mention that CLUSTER is a likely faster >>workaround. > > Unless, of course, you don't particularly care about the order of > the items in your table; you might end up wasting vastly more time > rewriting tables due to unnecessary clustering than for full vacuums > on a table that doesn't need it. Actually, this is irrelevant. If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases, it *frequently* is, and probably will be, nearly always, soon), then it's a faster workaround. -- output = ("cbbrowne" "@" "linuxfinances.info") http://cbbrowne.com/info/oses.html "What if you slept? And what if, in your sleep, you dreamed? And what if, in your dream, you went to heaven and there plucked a strange and beautiful flower? And what if, when you awoke, you had the flower in your hand? Ah, what then?" --Coleridge
On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote: >mstone+postgres@mathom.us (Michael Stone) writes: >> Unless, of course, you don't particularly care about the order of >> the items in your table; you might end up wasting vastly more time >> rewriting tables due to unnecessary clustering than for full vacuums >> on a table that doesn't need it. > >Actually, this is irrelevant. I think it's perfectly relevant. >If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases, >it *frequently* is, and probably will be, nearly always, soon), then >it's a faster workaround. Cluster reorders the table. If a table doesn't have any dead rows and you tell someone to run cluster or vacuum full, the vaccuum basically won't do anything and the cluster will reorder the whole table. Cluster is great for certain access patterns, but I've been noticing this odd tendency lately to treat it like a silver bullet. Mike Stone
Michael Stone wrote: > On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote: > >mstone+postgres@mathom.us (Michael Stone) writes: > >>Unless, of course, you don't particularly care about the order of > >>the items in your table; you might end up wasting vastly more time > >>rewriting tables due to unnecessary clustering than for full vacuums > >>on a table that doesn't need it. > > > >Actually, this is irrelevant. > > I think it's perfectly relevant. > > >If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases, > >it *frequently* is, and probably will be, nearly always, soon), then > >it's a faster workaround. > > Cluster reorders the table. If a table doesn't have any dead rows and > you tell someone to run cluster or vacuum full, the vaccuum basically > won't do anything and the cluster will reorder the whole table. Cluster > is great for certain access patterns, but I've been noticing this odd > tendency lately to treat it like a silver bullet. Well, it's certainly not a silver bullet; you would use VACUUM (not full) for most of your needs, and CLUSTER for the rare other cases. Of course you would not pick an index at random each time, but rather keep using the same one, which would supposedly be faster. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Michael Stone <mstone+postgres@mathom.us> writes: > On Wed, May 16, 2007 at 03:34:42PM -0400, Chris Browne wrote: >> If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases, >> it *frequently* is, and probably will be, nearly always, soon), then >> it's a faster workaround. > Cluster reorders the table. If a table doesn't have any dead rows and > you tell someone to run cluster or vacuum full, the vaccuum basically > won't do anything and the cluster will reorder the whole table. Cluster > is great for certain access patterns, but I've been noticing this odd > tendency lately to treat it like a silver bullet. Sure, but VACUUM FULL looks even less like a silver bullet. There's been talk of providing an operation that uses the same infrastructure as CLUSTER, but doesn't make any attempt to re-order the table: just seqscan the old heap, transfer still-live tuples into a new heap, then rebuild indexes from scratch. This is clearly going to be a lot faster than a VACUUM FULL under conditions in which the latter would have to move most of the tuples. Heikki just fixed one of the major objections to it (ie, CLUSTER not being MVCC-safe). The other objection is that peak transient disk space usage could be much higher than VACUUM FULL's, but still for a lot of scenarios this'd be better. regards, tom lane