Home > mailing lists

Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers

From	Pavan Deolasee
Subject	Proposal: Another attempt at vacuum improvements
Date	May 24, 2011 06:58:58
Msg-id	BANLkTimiRMwUabpZXk+J3gh5QCLy0qVzVg@mail.gmail.com Whole thread Raw
Responses	Re: Proposal: Another attempt at vacuum improvements (Robert Haas <robertmhaas@gmail.com>) Re: Proposal: Another attempt at vacuum improvements (Simon Riggs <simon@2ndQuadrant.com>)
List	pgsql-hackers

Tree view

Hi All,

Some of the ideas regarding vacuum improvements were discussed here:

http://archives.postgresql.org/pgsql-hackers/2008-05/msg00863.php

http://archives.postgresql.org/pgsql-patches/2008-06/msg00059.php

A recent thread was started by Robert Haas, but I don't know if we logically concluded that either.

http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php

This was once again brought up by Robert Haas in a discussion with Tom and me during the PGCon and we agreed there are few things we can do make vacuum more performant. One of the things that Tom mentioned is that the vacuum today is not aware of the fact that its a periodic operation and there might be ways to utilize that in some way.

The biggest gripe today is that vacuum needs two heap scans and each scan dirties the buffer. While visibility map ensures that not-all blocks are read and written during the scan, for a very large table, even a small percentage of blocks can be significant. Further, post-HOT, the second scan of the heap does not really reclaim any significant space, except for dead line pointers. So there is a good reason to avoid that. I wanted to start a discussion just about that. I am proposing one solution below, but I am not married to the idea.

So the idea is to separate the index vacuum (removing index pointers to dead tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning or using regular vacuum), we can spool the dead line pointers somewhere. To avoid any hot-spots during normal processing, the spooling can be done periodically like the stats collection. One obvious choice for spooling dead line pointers is to use a relation fork. The index vacuum will be kicked off periodically depending on the number of spooled deal line pointers. When that happens, the index vacuum will remove all index pointers pointing to those dead line pointers and forget the spooled line pointers.

The dead line pointers themselves will be removed whenever a heap page is later vacuumed, either as part of HOT pruning or the next heap vacuum. We would need some mechanism though to know that the index pointers to the existing dead line pointers have been vacuumed and its safe to remove them now. May be we can track the last operation that generated a dead line pointer in the page using a LSN in the page header and also keep track of the LSN of the last successful index vacuum. If the index vacuum LSN is greater than the page header vacuum LSN, we can safely remove the existing dead line pointers. I am deliberately not suggesting how to track the index vacuum LSN since my last proposal to do something similar through a pg_class column was shot down by Tom :-)

In nutshell, what I am suggesting is to do heap and index vacuuming independently. The heap will be vacuumed either by HOT pruning or a periodic heap vacuum and the dead line pointers will be collected. An index vacuum will remove the index pointers to those dead line pointers. And at some later point, the dead line pointers will be removed, either as part of retail or complete heap vacuum. Its not clear if its useful, but a single index vacuum can follow multiple heap vacuums or vice versa.

Another advantage of this technique would be that we can then support start/stop heap vacuum or vacuuming a range of blocks at a time or even vacuuming only those blocks which are already cached in the buffer cache. Just a hand-waving at this point, but seems possible.

Suggestions/comments/criticism all welcome, but please don't shoot down the idea on implementation details since I have really not spent time on that, so it will be easy find holes and corner cases. That can be worked out if we believe something like this will be useful.

Thanks,

Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

pgsql-hackers by date:

From: Vaibhav Kaushal
Date: 24 May 2011, 06:19:42
Subject: Re: Foreign memory context read

From: Noah Misch
Date: 24 May 2011, 09:09:28
Subject: Re: Reducing overhead of frequent table locks

Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers

Previous

Next