Re: Auto Vacuum - Mailing list pgsql-hackers
From | Gavin Sherry |
---|---|
Subject | Re: Auto Vacuum |
Date | |
Msg-id | Pine.LNX.4.58.0411291340190.23987@linuxworld.com.au Whole thread Raw |
In response to | Re: Auto Vacuum (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-hackers |
Matthew T. O'Connor looked at this fairly closely leading up to 8.0 feature freeze. There was a long discussion earlier this year with respect to libpq vs. using backend functions directly to vacuum multiple databases. http://archives.postgresql.org/pgsql-hackers/2004-03/msg00931.php This should be mandatory reading before proceeding. Also, you want to talk to Matthew and review his patch (submitted early July, I think). Thanks, Gavin On Sun, 28 Nov 2004, Bruce Momjian wrote: > > I have added an auto-vacuum TODO item: > > * Auto-vacuum > o Move into the backend code > o Scan the buffer cache to find free space or use background writer > o Use free-space map information to guide refilling > > > --------------------------------------------------------------------------- > > Russell Smith wrote: > > Hi All, > > > > I am doing serious thinking about the implementation of Auto Vacuum as part of the backend, Not using libpq, but classinginternal functions directly. > > It appears to me that calling internal functions directly is a better implementation than using the external libraryto do the job. > > > > I know I might be stepping on Matthew's toes, but I don't really want to. I am a complete newbie to the postgresql code,however I am trying. > > Vacuum appears to be one of the bigger saw points with administrator having to configure it via scheduled tasks. > > > > I have outlined things I have thought about below. I've surely missed a lot, and am open to feedback. Others may likethe current tuning used > > by the external autovacuum, however to have stats that are collected at vacuum, and used for later vacuum would meanwe don't need the > > stats collector running. > > > > The major autovacuum issues > > > > 1. Transaction Wraparound > > 2. Vacuum of relations > > 3. Tracking of when to do vacuums > > 4. Where to store information needed by auto vacuum > > > > I would suggest the first step is to replicate the wraparound and relation vacuuming from the current code first. ButI could be wrong. > > Currently there are a lot of tuning options in the external autovacuum, and the best way to run vacuum internally willneed to be thought out. > > > > 1. Transaction Wraparound > > > > It appears from the code, that the best time to do a transaction wraparound db wide vacuum is when the > > frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform->datfrozenxid). And is probably the most > > simple vacuum to implement. > > > > > > 2. Vacuuming of relations > > > > Currently, the entire heap must be vacuumed at one time. I would possible be desireable to have only part of the relationvacuumed at > > a time. If you can find out which parts of the relation have the most slack space. There is a todo item regarding trackingrecent deletions > > so they can be resused. Some form of this would be helpful to work out what to vacuum. Performance issues for thistype of activity > > may be a concern. But I have no experience to be able to make comment on them. So I welcome yours. > > > > 3. Tracking of when to vacuum > > > > Current autovacuum relies the stats collector to be running. I would like to only use the stats if they are available, > > and have an option to be able to vacuum accurately without having to have stats running. > > > > By adding certain histograms, on tuples, filesize and slack space we can guage the time between vacuums > > > > number of tuples will show the inserts as opposed to updates. > > file size will show that the file is growing and by how much between vacuums. > > slack space will show the delete/updated records. > > > > A new guc and relation option would need to be implemented to give a target slack space in a file. > > this will help to reduce growth in relations if vacuum happens to not run frequently enough. This information > > can also inform autovacuum that it should be vacuuming more frequently. The number would be a percentage, > > eg 10% of the total file size is allowed to be unused. Also alter table would allow users to set levels of slackness > > for each relation. If the table gets too much more than the target slack space, a "partial" vacuum full could be run > > to reduce the size of the table by moving tuples at the end of the table to slack space near the beginning and > > shortening the table length. It would require a full table lock, but you may be able to space it out, to only do a page > > at a time. > > > > /* target percentage of slack space */ > > vacuum_default_target_slack_space = 10 > > > > ALTER TABLE SET TARGET SLACK SPACE = 10; > > > > 4. Where to store information required by auto vacuum. > > > > Auto vacuum needs somewhere to stop the information it needs about current and future vacuuming. I am unsure of where > > to put this. It appears as if it will have a number of fields. I feel like pg_class is the best place to put the information,but > > again I have no idea. > > > > That's the best I can do for now. I can clarify things further if required. > > > > Regards > > > > Russell. > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
pgsql-hackers by date: