Re: Autovacuum loose ends - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Autovacuum loose ends |
Date | |
Msg-id | 20050714170620.GD19778@alvh.no-ip.org Whole thread Raw |
In response to | Autovacuum loose ends (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Autovacuum loose ends
Re: Autovacuum loose ends |
List | pgsql-hackers |
On Thu, Jul 14, 2005 at 10:52:56AM -0400, Tom Lane wrote: > I've applied Alvaro's latest integrated-autovacuum patch. There are > still a number of loose ends to be dealt with before beta, though: Thanks, and again sorry for the bugs. The code for shutting the whole thing down was not easy for me to understand -- I think it should be better documented. I can send a src/backend/postmaster/README file if you think it's worth it; I'd document how to create processes, how to handle shutdown, and how does the signalling mechanism work. (It took me at least an hour to figure the signal handling out, and as you see I still had gross misunderstadings.) > * The code does not make a provision to ignore temporary tables. > Although vacuum.c and analyze.c will disregard the request to touch > such tables, it'd probably be better to recognize the situation further > upstream. In particular it seems that autovacuum will continually throw > ANALYZE requests for a temp table due to lack of stats. Oh, is that right? Actually in the end I forgot about temp tables so I didn't handle them specially, but now I remember that when I started looking at Matthew's integration code I thought that temp tables should be analyzed if they happen to have a lot of new tuples, so that the planner would have good stats about them. > * For that matter I'm unconvinced that it's a good idea to try to force > the pgstat DB to pick up every table in every database. If there's no > entry it's because the table is not getting modified, and therefore it > seems to me that we can just leave well enough alone. The code really > is not very good about doing nothing where nothing is called for ;-) Hmm. The problem is that the table may merit a first ANALYZE, and in a second run we need to know that another one is not needed. How would we know that, if we don't keep track on it in the pgstat DB? Keeping no info about a table seems problematic to me. Also, remember that there were mentions of changing wraparound Xid to be kept track of on a per-table basis, instead of per-database (for 8.2 I assume). If this happens we will _need_ to check every table. > * The code ignores datallowconn and therefore will periodically vacuum > template0. I've got mixed emotions about this --- it could save > someone's bacon if they failed to properly VACUUM FREEZE a template > database, but in 99.99% of installations it's just wasted cycles. > Maybe it'd make sense to perform XID-wraparound-prevention vacuuming, > but not anything more, in a template DB. Thoughts? > > * Or actually, it would vacuum template0, except that since no regular > backend ever connects to template0, there will be no stats DB entry for > it and so the loop in AutoVacMain will ignore it. This is definitely > BAD as it means that a database that's not been touched since postmaster > start will never be vacuumed, not even for XID wraparound prevention. > That test needs to be weakened. See, that's what I'm talking about :-) No information about an object is a problem. Now, I don't think it's a problem to periodically vacuum template0, because it will connect to it and quickly realize that no work is needed. OTOH I think you can argue that we document that datallowcon=false databases should be frozen and thus we can just assume that they don't need vacuuming. However this doesn't seem safe -- it'll quickly end up in the "PostgreSQL gotchas" section. Maybe we could pick the first database with no entry in pgstat, and process that. After it's processed, pgstat will have complete data about it. Another idea would be keeping a per-database dead tuple counter, or some other metric, and use that as a parameter in choosing what database to vacuum. The current test (last autovac start time) is certainly very naive. Yet another idea is to keep track of current Xid as of the last autovac start, and compare that with the current Xid, in order to check for wraparound. Not sure if it's possible to check current Xid without connecting to a database first. > * I'm still pretty concerned about the handling of shared catalogs. > AFAICS the current pgstats infrastructure simply gets this wrong, > meaning that shared catalogs will not get adequate vacuuming. We need > to fix that. Maybe we can store them in pgstat in a pseudo-database with Oid=0, and special case them everywhere. Where do we store pg_autovacuum values? Or do we dictate that they can only use default parameters from GUC? > * As Alvaro noted, the default parameter settings need a lookover. > What is in the patch is not what was the default in the contrib module, > but the contrib defaults seem awfully passive. Yeah, the values you saw in the patch were suggested by Matthew. I had the contrib module's values originally. > * The documentation badly needs work. I committed some minimal > additions to runtime.sgml and catalogs.sgml, but the chapter about > routine maintenance needs a section added about how to use autovac. Matthew is on that, I think. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "Right now the sectors on the hard disk run clockwise, but I heard a rumor that you can squeeze 0.2% more throughput by running them counterclockwise. It's worth the effort. Recommended." (Gerry Pourwelle)
pgsql-hackers by date: