Thread: perfromance impact of vacuum
All, What impact in performance does vacuum have on an active database? I'm being asked about this...or rather...someone is questioning the use of postgresql because of this Thanks Jay
On 14 Jul 2003 at 14:13, Jay O'Connor wrote: > What impact in performance does vacuum have on an active database? I'm > being asked about this...or rather...someone is questioning the use of > postgresql because of this I run it daily at off-peak times and it doesn't seem to affect performance at all. If you vacuum regularly, the impact is minimized. It really depends upon how many updates/deletes your database has done since the last time you ran vacumm. But don't take my word for it. See what others have to say. -- Dan Langille : http://www.langille.org/
On Mon, 14 Jul 2003, Jay O'Connor wrote: > All, > > What impact in performance does vacuum have on an active database? I'm > being asked about this...or rather...someone is questioning the use of > postgresql because of this Hey, if they can't win on actual performance, they can always create straw men, huh? Seriously, the impact of vacuum tends to be pretty small. On systems with good I/O bandwidth the general performance hit of a regulat (non-full) vacuum is about 1 to 5% performance loss. My guess is you could buy a big enough box to offset that for a fraction of the cost of a commercial license for most databases. On my dual 750MHZ PIII with 1.5 gig it generally grabs about 10% of one CPU. the biggest problem with running vacuums while doing other things is that sometimes vacuuming flushes out postgresql's caches by grabbing all the shared memory when vacuuming large tables. Just a quick check shows pgbench -c 5 -t 200 running during a lazy vacuumdb -a (vacuum all databases) is about 75 to 80% as fast as it is when there is no vacuumdb running on my box. My box does NOT have a lot of memory allocated for vacuuming, as we do most of our vacuuming of whole databases at night. Note that you can vacuum individual tables / indexes, so if the problem is with one frequently changing table, the cost can be kept down by vacuuming just that one problem table. Good Performance tuning is far more important to postgresql performance than is whether or not you're vacuuming mid day.
"Jay O'Connor" <joconnor@cybermesa.com> writes: > What impact in performance does vacuum have on an active database? I've seen varying reports on this; some say it's not a problem, and some say it is. I suspect (without real proof) that the critical issue is whether you have adequate I/O bandwidth. If your disk is near saturation already, the extra I/O load for vacuum can drive you over the knee of the performance curve and cause serious degradation. But if you've got spare I/O bandwidth then it's not a big problem. You can reduce the I/O needs for routine vacuums by only vacuuming the tables you need to, and selecting an appropriate vacuum frequency for each one. (You might want to try the "autovacuum" daemon that's being worked on, as a substitute for doing this tuning by hand. Not sure if the daemon is ready for prime time though.) BTW, I suggest moving any followup discussion to pgsql-performance. regards, tom lane
On Mon, 2003-07-14 at 17:13, Jay O'Connor wrote: > What impact in performance does vacuum have on an active database? I'm > being asked about this...or rather...someone is questioning the use of > postgresql because of this There is no easy answer to this question, other than if you vacuum appropriately (not to often, or too infrequently) the net impact on system performance is a win. A database system that doesn't require a vacuum type process still has to do the same work, it just does it at transaction time, postgres defers this work until vacuum is run, which is typically done at off peak times. As the author of the pg_autovacuum daemon in 7.4 contrib, I would suggest you try it :-) The primary advantage of the daemon is that it monitors activity and when needed, vacuums specific tables, not whole databases. The thresholds that dictate when vacuums occur can be set at runtime, and should allow you to get it performing the way you want. If you have any questions or problems let me know. I am always interested in hearing about others real world experiences.
On 2003.07.14 20:41 Matthew T. O'Connor wrote: > On Mon, 2003-07-14 at 17:13, Jay O'Connor wrote: > > What impact in performance does vacuum have on an active database? I'm > > being asked about this...or rather...someone is questioning the use of > > postgresql because of this > > There is no easy answer to this question, other than if you vacuum > appropriately (not to often, or too infrequently) the net impact on > system performance is a win. A database system that doesn't require a > vacuum type process still has to do the same work, it just does it at > transaction time, postgres defers this work until vacuum is run, which > is typically done at off peak times. Thanks. Just curious but is the length of time to vacuum influenced more by the size of the table or the number of dead tuples? Take care, Jay
On Tue, Jul 15, 2003 at 10:37:28 -0700, Jay O'Connor <joconnor@cybermesa.com> wrote: > > Just curious but is the length of time to vacuum influenced more by the > size of the table or the number of dead tuples? I think the fraction of the table that is dead tuples is probably the best thing to look at for deciding when to vacuum (unless the tables are very small in which case checking for the number of blocks used may be better).
On 2003.07.15 09:53 Bruno Wolff III wrote: > On Tue, Jul 15, 2003 at 10:37:28 -0700, > Jay O'Connor <joconnor@cybermesa.com> wrote: > > > > Just curious but is the length of time to vacuum influenced more by the > > size of the table or the number of dead tuples? > > I think the fraction of the table that is dead tuples is probably the > best > thing to look at for deciding when to vacuum (unless the tables are very > small in which case checking for the number of blocks used may be > better). > Actually what I meant is how long the vacuum runs. We're going to have a big database (few TB projected, but I don't know where those numbers come from) and I'm trying to ausage concerns that vacuuming will impact performance significantly. Thanks Jay
On Tue, Jul 15, 2003 at 11:04:53AM -0700, Jay O'Connor wrote: > Actually what I meant is how long the vacuum runs. We're going to have a > big database (few TB projected, but I don't know where those numbers come > from) and I'm trying to ausage concerns that vacuuming will impact > performance significantly. It depends very heavily on your expired-tuple percentage. But it is still not free to vacuum a large table. And vacuum full always scans the whole table. Remember that vacuum operates on tables, which automatically means that it does nasty things to your cache. The stand-alone analyse can be helpful here. It only does samples of the tables under analysis, so you don't face the same I/O load. If all you're doing is adding to a table, it may be worth investigating. Keep in mind, though, you still need to vacuum every 2 billion transactions. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Tue, 15 Jul 2003, Andrew Sullivan wrote: > On Tue, Jul 15, 2003 at 11:04:53AM -0700, Jay O'Connor wrote: > > Actually what I meant is how long the vacuum runs. We're going to have a > > big database (few TB projected, but I don't know where those numbers come > > from) and I'm trying to ausage concerns that vacuuming will impact > > performance significantly. > > It depends very heavily on your expired-tuple percentage. But it is > still not free to vacuum a large table. And vacuum full always scans > the whole table. > > Remember that vacuum operates on tables, which automatically means > that it does nasty things to your cache. > > The stand-alone analyse can be helpful here. It only does > samples of the tables under analysis, so you don't face the same I/O > load. If all you're doing is adding to a table, it may be worth > investigating. Keep in mind, though, you still need to vacuum every > 2 billion transactions. this sounds like one of those places where the ability of a file system to be told not to cache the accesses of a certain child process would be a big win. Wasn't there some discussionon BSD's ability to do this recently and whether it was a win to port it into postgresql. I'd say that for large databases being vacuumed mid-day it would be a great win.
On Tue, Jul 15, 2003 at 11:53:50AM -0500, Bruno Wolff III wrote: > On Tue, Jul 15, 2003 at 10:37:28 -0700, > Jay O'Connor <joconnor@cybermesa.com> wrote: > > > > Just curious but is the length of time to vacuum influenced more by the > > size of the table or the number of dead tuples? > > I think the fraction of the table that is dead tuples is probably the best > thing to look at for deciding when to vacuum (unless the tables are very > small in which case checking for the number of blocks used may be better). Actually, for a full vacuum, where the tuples are also has a significant effect. If you have a large table and you do a large update so that the table is as follows: <------------------ data ------------------><deleted-tuples><updated-tuples> The vacuum full will then move the updated over the deleted. But it does this by scanning backwards through the table copying tuples forward over the deleted ones. I have a sneaking suspicion Linux's caching algorithm in some versions doesn't handle this very well. I've going to see if inserting into another table, deleteing, vacuuming and then inserting back is faster. Or don't do large updates on large tables. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "All that is needed for the forces of evil to triumph is for enough good > men to do nothing." - Edmond Burke > "The penalty good people pay for not being interested in politics is to be > governed by people worse than themselves." - Plato
Attachment
scott.marlowe wrote: > > The stand-alone analyse can be helpful here. It only does > > samples of the tables under analysis, so you don't face the same I/O > > load. If all you're doing is adding to a table, it may be worth > > investigating. Keep in mind, though, you still need to vacuum every > > 2 billion transactions. > > this sounds like one of those places where the ability of a file system to > be told not to cache the accesses of a certain child process would be a > big win. > > Wasn't there some discussionon BSD's ability to do this recently and > whether it was a win to port it into postgresql. I'd say that for large > databases being vacuumed mid-day it would be a great win. It was Solaris with free-behind. I hope new caching rules will fix this soon --- a few people are working on it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073