Re: Vacuums on large busy databases - Mailing list pgsql-performance
From | Jeff Davis |
---|---|
Subject | Re: Vacuums on large busy databases |
Date | |
Msg-id | 1158280502.29889.183.camel@dogma.v10.wvs Whole thread Raw |
In response to | Re: Vacuums on large busy databases (Francisco Reyes <lists@stringsutils.com>) |
List | pgsql-performance |
On Thu, 2006-09-14 at 19:30 -0400, Francisco Reyes wrote: > Will have to talk to the developers. In particular for every insert there > are updates. I know they have at least one table that gets udpated to have > summarized totals. > If the table being updated is small, you have no problems at all. VACUUM that table frequently, and the big tables rarely. If the big tables are only INSERTs and SELECTs, the only reason to VACUUM is to avoid the xid wraparound. See: <http://www.postgresql.org/docs/8.1/static/maintenance.html> See which tables need VACUUM, and how often. Use the statistics to see if VACUUMing will gain you anything before you do it. > One of the reasons I was doing the vacuumdb of the entire DB was to get the > number of shared-buffers. Now that I have an idea of how much I need I will > likely do something along the lines of what you suggest. One full for > everything at night and during the days perhaps do the tables that get more > updated. I also set more aggresive values on autovacuum so that should help > some too. Why VACUUM FULL? That is generally not needed. Re-evaluate whether you're gaining things with all these VACUUMs. > > You can run ANALYZE more frequently on all the > > tables, because it does not have to read the entire table and doesn't > > interfere with the rest of the operations. > > On a related question. Right now I have my autovacuums set as: > autovacuum_vacuum_threshold = 50000 > autovacuum_analyze_threshold = 100000 > autovacuum_vacuum_scale_factor = 0.05 > autovacuum_analyze_scale_factor = 0.1 > > Based on what you described above then I could set my analyze values to the > same as the vacuum to have something like > autovacuum_vacuum_threshold = 50000 > autovacuum_analyze_threshold = 50000 > autovacuum_vacuum_scale_factor = 0.05 > autovacuum_analyze_scale_factor = 0.05 > > For DBs with hundreds of GBs would it be better to get > autovacuum_analyze_scale_factor to even 0.01? The permanent DB is over 200GB > and growing.. the 100GB ones are staging.. By the time we have finished > migrating all the data from the old system it will be at least 300GB. 0.01 > is still 3GB.. pretty sizable. Just test how long an ANALYZE takes, and compare that to how quickly your statistics get out of date. As long as postgres is choosing correct plans, you are ANALYZE-ing often enough. ANALYZE takes statistical samples to avoid reading the whole table, so it's really not a major influence on performance in my experience. > Do the thresholds tabke presedence over the scale factors? Is it basically > if either one of them gets hit that the action will take place? u = number of tuples UPDATE-ed or DELETE-ed (i.e. dead tuples) r = the (estimated) number of total live tuples in the relation In a loop, autovacuum checks to see if u > (r*autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold), and if so, it runs VACUUM. If not, it sleeps. It works the same way for ANALYZE. So, in a large table, the scale_factor is the dominant term. In a small table, the threshold is the dominant term. But both are taken into account. Regards, Jeff Davis
pgsql-performance by date: