On Mon, Jan 22, 2007 at 02:51:47PM +0000, Heikki Linnakangas wrote:
> I've been looking at the way we do vacuums.
>
> The fundamental performance issue is that a vacuum generates
> nheapblocks+nindexblocks+ndirtyblocks I/Os. Vacuum cost delay helps to
> spread the cost like part payment, but the total is the same. In an I/O
> bound system, the extra I/O directly leads to less throughput.
>
> Therefore, we need to do less I/O. Dead space map helps by allowing us
> to skip blocks that don't need vacuuming, reducing the # of I/Os to
> 2*ndirtyblocks+nindexblocks. That's great, but it doesn't help us if the
> dead tuples are spread uniformly.
>
> If we could piggyback the vacuum I/Os to the I/Os that we're doing
> anyway, vacuum wouldn't ideally have to issue any I/O of its own. I've
> tried to figure out a way to do that.
>
> Vacuum is done in 3 phases:
>
> 1. Scan heap
> 2. Vacuum index
> 3. Vacuum heap
> Instead of doing a sequential scan, we could perform the 1st phase by
> watching the buffer pool, scanning blocks for dead tuples when they're
> in memory and keeping track of which pages we've seen. When all pages
> have been seen, the tid list is sorted and 1st phase is done.
>
> In theory, the index vacuum could also be done that way, but let's
> assume for now that indexes would be scanned like they are currently.
>
> The 3rd phase can be performed similarly to the 1st phase. Whenever a
> page enters the buffer pool, we check the tid list and remove any
> matching tuples from the page. When the list is empty, vacuum is complete.
Is there any real reason to demark the start and end of a vacuum? Why
not just go to a continuous process? One possibility is to keep a list
of TIDs for each phase, though that could prove tricky with multiple
indexes.
> A variation of the scheme would be to keep scanning pages that are in
> cache, until the tid list reaches a predefined size, instead of keeping
> track of which pages have already been seen. That would deal better with
> tables with hot and cold spots, but it couldn't advance the relfrozenid
> because there would be no guarantee that all pages are visited. Also, we
> could start 1st phase of the next vacuum, while we're still in the 3rd
> phase of previous one.
What if we tracked freeze status on a per-page basis? Perhaps track the
minimum XID that's on each page. That would allow us to ensure that we
freeze pages that are approaching XID wrap.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)