Re: Still recommending daily vacuum... - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Still recommending daily vacuum... |
Date | |
Msg-id | 20070706191912.GC15358@alvh.no-ip.org Whole thread Raw |
In response to | Re: Still recommending daily vacuum... ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: Still recommending daily vacuum...
|
List | pgsql-hackers |
Kevin Grittner wrote: > >>> On Tue, Jul 3, 2007 at 5:34 PM, in message > <20070703223402.GA5491@alvh.no-ip.org>, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > Kevin Grittner wrote: > > > >> Autovacuum is enabled with very aggressive settings, to cover small > >> tables, including one with about 75 rows that can be updated 100 or more > >> times per second. Even with these settings there is zero chance of any > >> table of even moderate size hitting the autovacuum threshold between our > >> scheduled vacuums. > > > > Sounds like you would be served by setting those specific tables to a > > lower vacuum scale factor (keeping a more normal default for the rest of > > the tables), and having a non-zero vacuum delay setting (to avoid > > excessive I/O consumption). Have you tried that? > > I did play with that, but it doens't seem to make sense in our environment. > We have about 100 databases, most of them scattered around the state, and > any extra maintenance like that has a cost, particularly with the daily > cluster changing the oid. Both from doing the math and from experience, > I can say that the autovacuum only affects the small, frequently updated > tables, so I could see no benefit. Am I missing somethign? (I can't see > where this causes any extra I/O.) There seem to be a misunderstanding here. 1. Cluster does not change the OID. It only changes the relfilenode. The file on disk is named something else, but the OID used in the database remains unchanged. So if you insert something into pg_autovacuum it continues to work after a CLUSTER, you don't need to update the OID. 2. The point of autovacuum is to get rid of maintenance burden, not add to it. If you know which tables are small and frequently updated, then configure those to specific settings that you've found to be optimal, and then you don't need to worry about vacuuming them any longer. You already know this but: autovacuum uses a formula to determine which tables to vacuum. The formula is based on the number of dead tuples, the size of the table and two factors that you can configure per table as well as globally. If you didn't tune it to match specific tables, most likely your biggest tables never met the formula's condition, which is why you were seeing it affecting only the small tables (which met the condition under the values you configured server-wide). The extra I/O I was talking about would come from vacuuming one of your biggest tables, which could cause the amount of I/O to swamp everything else the server was doing at the time. Since it never actually touched the big tables this hasn't happened to you yet. Do note that autovacuum uses the vacuum_cost_delay if autovacuum_vacuum_cost_delay is set to the default value of -1. > Our tables tend to fall into one of four categories, small tables with high > update rates, medium tables (millions or tens of millions of rows) with > thousands or tens of thousands of updates per day, static tables of various > sizes that are only modified as part of a software release, and big honking > tables (100s of GB) which are either insert-only or are insert with > periodic purge of old rows. Only the first group has a chance of being > autovacuumed in normal operations. Event he purges don't cause it to kick > in. It could certainly vacuum all your tables. But one thing to keep in mind that as of 8.2, only one autovacuum process can be running. So if and when it decides to vacuum the big tables, it will be long before it is able to go back and check the small tables. This is fixed in 8.3. > >> Oh, the tiny, high-update tables occasionally bloat to hundreds or > >> thousands of pages because of long-running transactions, so we schedule > >> a daily cluster on those, just to keep things tidy. > > > > If you can afford the cluster then there's no problem. I don't expect > > that to change in 8.3. > > Here also we're talking 10 to 20 milliseconds. I understand that in 8.2 > that leaves a chance of an error, but we seem to have dodged that bullet > so far. Has that gotten any safer in 8.3? Yes, it did, assuming I understood what error are you talking about (cluster not leaving dead tuples possibly seen by concurrent transactions). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
pgsql-hackers by date: