Re: Resumable vacuum proposal and design overview - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Resumable vacuum proposal and design overview
Date
Msg-id 1172594984.3760.765.camel@silverbirch.site
Whole thread Raw
In response to Re: Resumable vacuum proposal and design overview  ("Jim C. Nasby" <jim@nasby.net>)
Responses Re: Resumable vacuum proposal and design overview  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 2007-02-27 at 10:37 -0600, Jim C. Nasby wrote:
> On Tue, Feb 27, 2007 at 11:44:28AM +0900, Galy Lee wrote:
> > For example, there is one table:
> >    - The table is a hundreds GBs table.
> >    - It takes 4-8 hours to vacuum such a large table.
> >    - Enabling cost-based delay may make it last for 24 hours.
> >    - It can be vacuumed during night time for 2-4 hours.
> > 
> > It is true there is no such restrict requirement that vacuum
> > need to be interrupt immediately, but it should be stopped in an
> > *predictable way*. In the above example, if we have to wait for the end
> >  of one full cycle of cleaning, it may take up to 8 hours for vacuum to
> > stop after it has received stop request. This seems quit unacceptable.
> 
> Even with very large tables, you could likely still fit things into a
> specific time frame by adjusting how much time is spent scanning for
> dead tuples. The idea would be to give vacuum a target run time, and it
> would monitor how much time it had remaining, taking into account how
> long it should take to scan the indexes based on how long it's been
> taking to scan the heap. When the amount of time left becomes less than
> the estimate of the amount of time required to scan the indexes (and
> clean the heap), you stop the heap scan and start scanning indexes. As
> long as the IO workload on the machine doesn't vary wildly between the
> heap scan and the rest of the vacuum process, I would expect this to
> work out fairly well.
> 
> While not as nice as the ability to 'stop on a dime' as Tom puts it,
> this would be much easier and safer to implement. If there's still a
> need for something better after that we could revisit it at that time.

I do like this idea, but it also seems easy to calculate that bit
yourself. Run VACUUM, after X minutes issue stop_vacuum() and see how
long it takes to finish. Adjust X until you have it right.

If we did want to automate it, vacuum_target_duration userset GUC would
make, following Jim's thought. =0 means run-to-completion. 

Getting it to work well for VACUUM FULL would be more than a little
interesting though.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Seeking Google SoC Mentors
Next
From: Andrew Dunstan
Date:
Subject: Re: 7.x horology regression test on Solaris buildfarm machines