Thread: perfromance impact of vacuum

perfromance impact of vacuum

From
"Jay O'Connor"
Date:
All,

What impact in performance does vacuum have on an active database?  I'm
being asked about this...or rather...someone is questioning the use of
postgresql because of this

Thanks

Jay

Re: perfromance impact of vacuum

From
"Dan Langille"
Date:
On 14 Jul 2003 at 14:13, Jay O'Connor wrote:

> What impact in performance does vacuum have on an active database?  I'm
> being asked about this...or rather...someone is questioning the use of
> postgresql because of this

I run it daily at off-peak times and it doesn't seem to affect
performance at all.  If you vacuum regularly, the impact is
minimized.  It really depends upon how many updates/deletes your
database has done since the last time you ran vacumm.

But don't take my word for it. See what others have to say.
--
Dan Langille : http://www.langille.org/


Re: perfromance impact of vacuum

From
"scott.marlowe"
Date:
On Mon, 14 Jul 2003, Jay O'Connor wrote:

> All,
>
> What impact in performance does vacuum have on an active database?  I'm
> being asked about this...or rather...someone is questioning the use of
> postgresql because of this

Hey, if they can't win on actual performance, they can always create straw
men, huh?

Seriously, the impact of vacuum tends to be pretty small.  On systems with
good I/O bandwidth the general performance hit of a regulat (non-full)
vacuum is about 1 to 5% performance loss.

My guess is you could buy a big enough box to offset that for a fraction
of the cost of a commercial license for most databases.

On my dual 750MHZ PIII with 1.5 gig it generally grabs about 10% of one
CPU.

the biggest problem with running vacuums while doing other things is that
sometimes vacuuming flushes out postgresql's caches by grabbing all the
shared memory when vacuuming large tables.

Just a quick check shows pgbench -c 5 -t 200 running during a lazy
vacuumdb -a (vacuum all databases) is about 75 to 80% as fast as it is
when there is no vacuumdb running on my box.

My box does NOT have a lot of memory allocated for vacuuming, as we do
most of our vacuuming of whole databases at night.

Note that you can vacuum individual tables / indexes, so if the problem is
with one frequently changing table, the cost can be kept down by vacuuming
just that one problem table.

Good Performance tuning is far more important to postgresql performance
than is whether or not you're vacuuming mid day.


Re: perfromance impact of vacuum

From
Tom Lane
Date:
"Jay O'Connor" <joconnor@cybermesa.com> writes:
> What impact in performance does vacuum have on an active database?

I've seen varying reports on this; some say it's not a problem, and some
say it is.  I suspect (without real proof) that the critical issue is
whether you have adequate I/O bandwidth.  If your disk is near
saturation already, the extra I/O load for vacuum can drive you over the
knee of the performance curve and cause serious degradation.  But if
you've got spare I/O bandwidth then it's not a big problem.

You can reduce the I/O needs for routine vacuums by only vacuuming the
tables you need to, and selecting an appropriate vacuum frequency for
each one.  (You might want to try the "autovacuum" daemon that's being
worked on, as a substitute for doing this tuning by hand.  Not sure if
the daemon is ready for prime time though.)

BTW, I suggest moving any followup discussion to pgsql-performance.

            regards, tom lane

Re: perfromance impact of vacuum

From
"Matthew T. O'Connor"
Date:
On Mon, 2003-07-14 at 17:13, Jay O'Connor wrote:
> What impact in performance does vacuum have on an active database?  I'm
> being asked about this...or rather...someone is questioning the use of
> postgresql because of this

There is no easy answer to this question, other than if you vacuum
appropriately (not to often, or too infrequently) the net impact on
system performance is a win.  A database system that doesn't require a
vacuum type process still has to do the same work, it just does it at
transaction time, postgres defers this work until vacuum is run, which
is typically done at off peak times.

As the author of the pg_autovacuum daemon in 7.4 contrib, I would
suggest you try it :-)  The primary advantage of the daemon is that it
monitors activity and when needed, vacuums specific tables, not whole
databases.  The thresholds that dictate when vacuums occur can be set at
runtime, and should allow you to get it performing the way you want.  If
you have any questions or problems let me know.  I am always interested
in hearing about others real world experiences.



Re: perfromance impact of vacuum

From
"Jay O'Connor"
Date:
On 2003.07.14 20:41 Matthew T. O'Connor wrote:
> On Mon, 2003-07-14 at 17:13, Jay O'Connor wrote:
> > What impact in performance does vacuum have on an active database?  I'm
> > being asked about this...or rather...someone is questioning the use of
> > postgresql because of this
>
> There is no easy answer to this question, other than if you vacuum
> appropriately (not to often, or too infrequently) the net impact on
> system performance is a win.  A database system that doesn't require a
> vacuum type process still has to do the same work, it just does it at
> transaction time, postgres defers this work until vacuum is run, which
> is typically done at off peak times.

Thanks.

Just curious but is the length of time to vacuum influenced more by the
size of the table or the number of dead tuples?

Take care,
Jay

Re: perfromance impact of vacuum

From
Bruno Wolff III
Date:
On Tue, Jul 15, 2003 at 10:37:28 -0700,
  Jay O'Connor <joconnor@cybermesa.com> wrote:
>
> Just curious but is the length of time to vacuum influenced more by the
> size of the table or the number of dead tuples?

I think the fraction of the table that is dead tuples is probably the best
thing to look at for deciding when to vacuum (unless the tables are very
small in which case checking for the number of blocks used may be better).

Re: perfromance impact of vacuum

From
"Jay O'Connor"
Date:
On 2003.07.15 09:53 Bruno Wolff III wrote:
> On Tue, Jul 15, 2003 at 10:37:28 -0700,
>   Jay O'Connor <joconnor@cybermesa.com> wrote:
> >
> > Just curious but is the length of time to vacuum influenced more by the
> > size of the table or the number of dead tuples?
>
> I think the fraction of the table that is dead tuples is probably the
> best
> thing to look at for deciding when to vacuum (unless the tables are very
> small in which case checking for the number of blocks used may be
> better).
>

Actually what I meant is how long the vacuum runs.  We're going to have a
big database (few TB projected, but I don't know where those numbers come
from) and I'm trying to ausage concerns that vacuuming will impact
performance significantly.

Thanks
Jay

Re: perfromance impact of vacuum

From
Andrew Sullivan
Date:
On Tue, Jul 15, 2003 at 11:04:53AM -0700, Jay O'Connor wrote:
> Actually what I meant is how long the vacuum runs.  We're going to have a
> big database (few TB projected, but I don't know where those numbers come
> from) and I'm trying to ausage concerns that vacuuming will impact
> performance significantly.

It depends very heavily on your expired-tuple percentage.  But it is
still not free to vacuum a large table.  And vacuum full always scans
the whole table.

Remember that vacuum operates on tables, which automatically means
that it does nasty things to your cache.

The stand-alone analyse can be helpful here.  It only does
samples of the tables under analysis, so you don't face the same I/O
load.  If all you're doing is adding to a table, it may be worth
investigating.  Keep in mind, though, you still need to vacuum every
2 billion transactions.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: perfromance impact of vacuum

From
"scott.marlowe"
Date:
On Tue, 15 Jul 2003, Andrew Sullivan wrote:

> On Tue, Jul 15, 2003 at 11:04:53AM -0700, Jay O'Connor wrote:
> > Actually what I meant is how long the vacuum runs.  We're going to have a
> > big database (few TB projected, but I don't know where those numbers come
> > from) and I'm trying to ausage concerns that vacuuming will impact
> > performance significantly.
>
> It depends very heavily on your expired-tuple percentage.  But it is
> still not free to vacuum a large table.  And vacuum full always scans
> the whole table.
>
> Remember that vacuum operates on tables, which automatically means
> that it does nasty things to your cache.
>
> The stand-alone analyse can be helpful here.  It only does
> samples of the tables under analysis, so you don't face the same I/O
> load.  If all you're doing is adding to a table, it may be worth
> investigating.  Keep in mind, though, you still need to vacuum every
> 2 billion transactions.

this sounds like one of those places where the ability of a file system to
be told not to cache the accesses of a certain child process would be a
big win.

Wasn't there some discussionon BSD's ability to do this recently and
whether it was a win to port it into postgresql. I'd say that for large
databases being vacuumed mid-day it would be a great win.


Re: perfromance impact of vacuum

From
Martijn van Oosterhout
Date:
On Tue, Jul 15, 2003 at 11:53:50AM -0500, Bruno Wolff III wrote:
> On Tue, Jul 15, 2003 at 10:37:28 -0700,
>   Jay O'Connor <joconnor@cybermesa.com> wrote:
> >
> > Just curious but is the length of time to vacuum influenced more by the
> > size of the table or the number of dead tuples?
>
> I think the fraction of the table that is dead tuples is probably the best
> thing to look at for deciding when to vacuum (unless the tables are very
> small in which case checking for the number of blocks used may be better).

Actually, for a full vacuum, where the tuples are also has a significant
effect. If you have a large table and you do a large update so that the
table is as follows:

<------------------ data ------------------><deleted-tuples><updated-tuples>

The vacuum full will then move the updated over the deleted. But it does
this by scanning backwards through the table copying tuples forward over the
deleted ones. I have a sneaking suspicion Linux's caching algorithm in some
versions doesn't handle this very well.  I've going to see if inserting into
another table, deleteing, vacuuming and then inserting back is faster.

Or don't do large updates on large tables.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

Re: perfromance impact of vacuum

From
Bruce Momjian
Date:
scott.marlowe wrote:
> > The stand-alone analyse can be helpful here.  It only does
> > samples of the tables under analysis, so you don't face the same I/O
> > load.  If all you're doing is adding to a table, it may be worth
> > investigating.  Keep in mind, though, you still need to vacuum every
> > 2 billion transactions.
>
> this sounds like one of those places where the ability of a file system to
> be told not to cache the accesses of a certain child process would be a
> big win.
>
> Wasn't there some discussionon BSD's ability to do this recently and
> whether it was a win to port it into postgresql. I'd say that for large
> databases being vacuumed mid-day it would be a great win.

It was Solaris with free-behind.  I hope new caching rules will fix this
soon --- a few people are working on it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073