Thread: Question about VACUUM
Hi friends I want to know if it's possible to predict (calculate), how long a VACUUM FULL process will consume in a table? can I apply some formula to calculate this? thanks -- ---------------------------------------------------------- Visita : http://www.eqsoft.net ---------------------------------------------------------- Sigueme en Twitter : http://www.twitter.com/ernestoq
On Fri, Dec 2, 2011 at 8:32 PM, Ernesto Quiñones <ernestoq@gmail.com> wrote: > Hi friends > > I want to know if it's possible to predict (calculate), how long a > VACUUM FULL process will consume in a table? > > can I apply some formula to calculate this? If you look at what iostat is doing while the vacuum full is running, and divide the size of the table by that k/sec you can get a good approximation of how long it will take. Do you have naptime set to anything above 0?
Thanks for the answer Scott, actually my autovacuum_naptime is 1h .. but I don't find naptime parameter for a manual vacuum thanks again 2011/12/2 Scott Marlowe <scott.marlowe@gmail.com>: > On Fri, Dec 2, 2011 at 8:32 PM, Ernesto Quiñones <ernestoq@gmail.com> wrote: >> Hi friends >> >> I want to know if it's possible to predict (calculate), how long a >> VACUUM FULL process will consume in a table? >> >> can I apply some formula to calculate this? > > If you look at what iostat is doing while the vacuum full is running, > and divide the size of the table by that k/sec you can get a good > approximation of how long it will take. Do you have naptime set to > anything above 0? -- ---------------------------------------------------------- Visita : http://www.eqsoft.net ---------------------------------------------------------- Sigueme en Twitter : http://www.twitter.com/ernestoq
Ernesto Quiñones wrote: > Scott Marlowe wrote: >> Ernesto Quiñones wrote: >>> I want to know if it's possible to predict (calculate), how long >>> a VACUUM FULL process will consume in a table? I don't think you said what version of PostgreSQL you're using. VACUUM FULL prior to version 9.0 is not recommended for most situations, and can take days or weeks to complete where other methods of achieving the same end may take hours. If you have autovacuum properly configured, you will probably never need to run VACUUM FULL. >> If you look at what iostat is doing while the vacuum full is >> running, and divide the size of the table by that k/sec you can >> get a good approximation of how long it will take. Do you have >> naptime set to anything above 0? > > Thanks for the answer Scott, actually my autovacuum_naptime is 1h Ah, well that right there is likely to put you into a position where you need to do painful extraordinary cleanup like VACUUM FULL. In most situation the autovacuum defaults are pretty good. Where they need to be adjusted, the normal things which are actually beneficial are to change the thresholds to allow more aggressive cleanup or (on low-powered hardware) to adjust the cost ratios so that performance is less affected by the autovacuum runs. When autovacuum is disabled or changed to a long interval, it almost always results in bloat and/or outdated statistics which cause much more pain than a more aggressive autovacuum regimine does. > but I don't find naptime parameter for a manual vacuum I'm guessing that Scott was thinking of the vacuum_cost_delay setting: http://www.postgresql.org/docs/current/interactive/runtime-config-resource.html#GUC-VACUUM-COST-DELAY -Kevin
On Sat, Dec 3, 2011 at 6:11 AM, Ernesto Quiñones <ernestoq@gmail.com> wrote: > Thanks for the answer Scott, actually my autovacuum_naptime is 1h .. > but I don't find naptime parameter for a manual vacuum That's really high, but what I meant to as was what your vacuum_cost_delay was set to. Also vacuum_cost_limit.
Hi Kevin, comments after your comments 2011/12/3 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Ernesto Quiñones wrote: >> Scott Marlowe wrote: >>> Ernesto Quiñones wrote: > >>>> I want to know if it's possible to predict (calculate), how long >>>> a VACUUM FULL process will consume in a table? > > I don't think you said what version of PostgreSQL you're using. > VACUUM FULL prior to version 9.0 is not recommended for most > situations, and can take days or weeks to complete where other > methods of achieving the same end may take hours. If you have > autovacuum properly configured, you will probably never need to run > VACUUM FULL. I'm working with PostgreSQL 8.3 running in Solaris 10, my autovacuum paramaters are: autovacuum on autovacuum_analyze_scale_factor 0,5 autovacuum_analyze_threshold50000 autovacuum_freeze_max_age 200000000 autovacuum_max_workers 3 autovacuum_naptime 1h autovacuum_vacuum_cost_delay -1 autovacuum_vacuum_cost_limit -1 autovacuum_vacuum_scale_factor 0,5 autovacuum_vacuum_threshold 50000 my vacuums parameters are: vacuum_cost_delay 1s vacuum_cost_limit 200 vacuum_cost_page_dirty 20 vacuum_cost_page_hit 1 vacuum_cost_page_miss 10 vacuum_freeze_min_age 100000000 > Ah, well that right there is likely to put you into a position where > you need to do painful extraordinary cleanup like VACUUM FULL. In > most situation the autovacuum defaults are pretty good. Where they > need to be adjusted, the normal things which are actually beneficial > are to change the thresholds to allow more aggressive cleanup or (on > low-powered hardware) to adjust the cost ratios so that performance > is less affected by the autovacuum runs. I have a good performance in my hard disks, I have a good amount of memory, but my cores are very poor, only 1ghz each one. I have some questions here: 1. autovacuum_max_workers= 3 , each work processes is using only one "core" or one "core" it's sharing por 3 workers? 2. when I run a "explain analyze" in a very big table (30millons of rows) , explain returning me 32 millons of rows moved, I am assuming that my statistics are not updated in 2 millons of rows, but, is it a very important number? or maybe, it's a regular result. thanks for your help?
On Mon, Dec 5, 2011 at 10:19 AM, Ernesto Quiñones <ernestoq@gmail.com> wrote: > Hi Kevin, comments after your comments > > 2011/12/3 Kevin Grittner <Kevin.Grittner@wicourts.gov>: >> Ernesto Quiñones wrote: >>> Scott Marlowe wrote: >>>> Ernesto Quiñones wrote: >> >>>>> I want to know if it's possible to predict (calculate), how long >>>>> a VACUUM FULL process will consume in a table? >> >> I don't think you said what version of PostgreSQL you're using. >> VACUUM FULL prior to version 9.0 is not recommended for most >> situations, and can take days or weeks to complete where other >> methods of achieving the same end may take hours. If you have >> autovacuum properly configured, you will probably never need to run >> VACUUM FULL. > > I'm working with PostgreSQL 8.3 running in Solaris 10, my autovacuum > paramaters are: > > autovacuum on > autovacuum_analyze_scale_factor 0,5 > autovacuum_analyze_threshold50000 > autovacuum_freeze_max_age 200000000 > autovacuum_max_workers 3 > autovacuum_naptime 1h > autovacuum_vacuum_cost_delay -1 > autovacuum_vacuum_cost_limit -1 > autovacuum_vacuum_scale_factor 0,5 > autovacuum_vacuum_threshold 50000 > > my vacuums parameters are: > > vacuum_cost_delay 1s > vacuum_cost_limit 200 Those are insane settings for vacuum costing, even on a very slow machine. Basically you're starving vacuum and autovacuum so much that they can never keep up. > I have a good performance in my hard disks, I have a good amount of > memory, but my cores are very poor, only 1ghz each one. If so then your settings for vacuum costing are doubly bad. I'd start by setting the cost_delay to 1ms and raising your cost limit by a factor of 10 or more. > I have some questions here: > > 1. autovacuum_max_workers= 3 , each work processes is using only one > "core" or one "core" it's sharing por 3 workers? Each worker uses a single process and can use one core basically. Right now your vacuum costing is such that it's using 1/100000th or so of a CPU. > 2. when I run a "explain analyze" in a very big table (30millons of > rows) , explain returning me 32 millons of rows moved, I am assuming > that my statistics are not updated in 2 millons of rows, but, is it a > very important number? or maybe, it's a regular result. Look for projections being off by factors of 10 or more before it starts to make a big difference. 32M versus 30M is no big deal. 30k versus 30M is a big deal.
On Mon, Dec 5, 2011 at 10:42 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, Dec 5, 2011 at 10:19 AM, Ernesto Quiñones <ernestoq@gmail.com> wrote: >> vacuum_cost_delay 1s >> vacuum_cost_limit 200 > > Those are insane settings for vacuum costing, even on a very slow > machine. Basically you're starving vacuum and autovacuum so much that > they can never keep up. sorry, the word I meant there was pathological. No insult intended.
no problem Scott, thanks for your appreciations 2011/12/5 Scott Marlowe <scott.marlowe@gmail.com>: > On Mon, Dec 5, 2011 at 10:42 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Mon, Dec 5, 2011 at 10:19 AM, Ernesto Quiñones <ernestoq@gmail.com> wrote: >>> vacuum_cost_delay 1s >>> vacuum_cost_limit 200 >> >> Those are insane settings for vacuum costing, even on a very slow >> machine. Basically you're starving vacuum and autovacuum so much that >> they can never keep up. > > sorry, the word I meant there was pathological. No insult intended. -- ---------------------------------------------------------- Visita : http://www.eqsoft.net ---------------------------------------------------------- Sigueme en Twitter : http://www.twitter.com/ernestoq
Ernesto Quiñones<ernestoq@gmail.com> wrote: I understand the impulse to run autovacuum less frequently or less aggressively. When we first started running PostgreSQL the default configuration was very cautious. A lot of bloat would accumulate before it kicked in, at which point there was a noticeable performance hit, as it worked though a large number of dead pages. The first thing I did was to make it run less often, which only made things worse. The numbers we settled on through testing as optimal for us are very close to current default values (for recent major releases). Not only do queries run more quickly between autovacuum runs, because there is less dead space to wade through to get the current tuples, but the autovacuum runs just don't have the same degree of impact -- presumably because they find less to do. Some small, frequently updated tables when from having hundreds of pages down to one or two. > autovacuum_analyze_scale_factor 0,5 > autovacuum_analyze_threshold 50000 We use 0.10 + 10 in production. Defaults are now 0.10 + 50. That's the portion of the table plus a number of rows. Analyze just does a random sample from the table; it doesn't pass the whole table. > autovacuum_vacuum_scale_factor 0,5 > autovacuum_vacuum_threshold 50000 We use 0.20 + 10 in production. Defaults are now 0.20 + 50. Again, a proportion of the table (in this case what is expected to have become unusable dead space) plus a number of unusable dead tuples. > autovacuum_naptime 1h A one-page table could easily bloat to hundreds (or thousands) of pages within an hour. You will wonder where all your CPU time is going because it will constantly be scanning the same (cached) pages to find the one version of the row which matters. I recommend 1min. > vacuum_cost_delay 1s A vacuum run will never get much done at that rate. I recommend 10ms. > vacuum_cost_limit 200 We've boosted this to 600. Once you're in a "steady state", this is the setting you might want to adjust up or down as needed to make cleanup aggressive enough without putting a noticeable dent in performance while it is running. On 8.3 I believe you still need to worry about the fsm settings. Run your regular database vacuum with the VERBOSE option, and check what the last few lines say. If you don't have enough memory set aside to track free space, no vacuum regimen will prevent bloat. -Kevin
On Mon, Dec 5, 2011 at 11:36 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Ernesto Quiñones<ernestoq@gmail.com> wrote: >> vacuum_cost_limit 200 > We've boosted this to 600. Once you're in a "steady state", this is > the setting you might want to adjust up or down as needed to make > cleanup aggressive enough without putting a noticeable dent in > performance while it is running. On the busy production systems I've worked on in the past, we had this cranked up to several thousand along with 10 or so workers to keep up on a busy machine. The more IO your box has, the more you can afford to make vacuum / autovacuum aggressive.
On 12/5/11 1:36 PM, Kevin Grittner wrote: > I understand the impulse to run autovacuum less frequently or less > aggressively. When we first started running PostgreSQL the default > configuration was very cautious. The default settings are deliberately cautious, as default settings should be. But yes, anyone with a really large/high-traffic database will often want to make autovac more aggressive. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > On 12/5/11 1:36 PM, Kevin Grittner wrote: >> I understand the impulse to run autovacuum less frequently or >> less aggressively. When we first started running PostgreSQL the >> default configuration was very cautious. > > The default settings are deliberately cautious, as default > settings should be. I was talking historically, about the defaults in 8.1: http://www.postgresql.org/docs/8.1/interactive/runtime-config-autovacuum.html Those defaults were *over*-cautious to the point that we experienced serious problems. My point was that many people's first instinct in that case is to make the setting less aggressive, as I initially did and the OP has done. The problem is actually solved by making them *more* aggressive. Current defaults are pretty close to what we found, through experimentation, worked well for us for most databases. > But yes, anyone with a really large/high-traffic database will > often want to make autovac more aggressive. I think we're in agreement: current defaults are good for a typical environment; high-end setups still need to tune to more aggressive settings. This is an area where incremental changes with monitoring works well. -Kevin