Thread: Proposal for background vacuum full/cluster
I talked to a few people on IRC about this and they didn't think I was nuts, so maybe this is something practical... In a nutshell, my idea is to use the normal transactional/XID code to relocate tuples in the heap. Think of doing an UPDATE field=field if you could tell update what page to put the new tuple on. Using this mechanism, you can move tuples from the end of the heap to pages that have free space on them. The dead tuples at the end of the heap could then be vacuumed conventionally, and completely empty pages removed by that vacuum. Of course, it's not quite that simple. For starters, you'd want to do a conventional vacuum before this, both to free as much space as possible and to update the FSM. It might also be necessary to prevent backends from using the pages at the end of the heap (which you're trying to empty). I'm guessing that could be done just by removing the pages from the FSM. You'd also need to vacuum after emptying these pages to reclaim the disk space. To facilitate these things, it might be useful to be able to vacuum parts of the heap. So as pages are emptied at the end of the heap, they can be vacuumed and reclaimed while the pages are still probably in cache (and without requiring a re-vacuum of the entire table). Taking this technique one step further, it should also be possible to cluster in the background without blocking everything. One way to do this would be to empty the first page in the heap by moving it's tuples elsewhere, and vacuuming that page (but not putting it in the FSM). Once that page is available, you can start reading in from the clustering index and moving those tuples to the first page. One thing that might be an issue for both ideas is index bloat. But since reindex is a non-blocking operation, it doesn't seem unreasonable to either do that automatically or have the user do it. Is this TODOable? -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > In a nutshell, my idea is to use the normal transactional/XID code to > relocate tuples in the heap. Think of doing an UPDATE field=field if you > could tell update what page to put the new tuple on. Using this > mechanism, you can move tuples from the end of the heap to pages that > have free space on them. The dead tuples at the end of the heap could > then be vacuumed conventionally, and completely empty pages removed by > that vacuum. How exactly is this different from what happens now, assuming that you didn't run out of FSM? regards, tom lane
Jim C. Nasby wrote: >I talked to a few people on IRC about this and they didn't think I was >nuts, so maybe this is something practical... > >In a nutshell, my idea is to use the normal transactional/XID code to >relocate tuples in the heap. Think of doing an UPDATE field=field if you >could tell update what page to put the new tuple on. > Be careful not to fire UPDATE triggers on the tuple while doing so. >Of course, it's not quite that simple. For starters, you'd want to do a >conventional vacuum before this, both to free as much space as possible >and to update the FSM. It might also be necessary to prevent backends >from using the pages at the end of the heap (which you're trying to >empty). I'm guessing that could be done just by removing the pages from >the FSM. You'd also need to vacuum after emptying these pages to reclaim >the disk space. To facilitate these things, it might be useful to be >able to vacuum parts of the heap. So as pages are emptied at the end of >the heap, they can be vacuumed and reclaimed while the pages are still >probably in cache (and without requiring a re-vacuum of the entire >table). > > > Keep in mind that the transaction that does the update can't also vacuum it's own tuples. You'd have to end one transaction, then wait until every transaction running while the updater ran finishes, then start the transaction that vacuums. Obviously your command would need to be able to start and end transactions. (Meaning that it can't be a user defined function, and it probably can't be a normal self-contained command in postgres.) >Taking this technique one step further, it should also be possible to >cluster in the background without blocking everything. One way to do >this would be to empty the first page in the heap by moving it's tuples >elsewhere, and vacuuming that page (but not putting it in the FSM). Once >that page is available, you can start reading in from the clustering >index and moving those tuples to the first page. > >One thing that might be an issue for both ideas is index bloat. But >since reindex is a non-blocking operation, it doesn't seem unreasonable >to either do that automatically or have the user do it. > >Is this TODOable? > > I asked about something like this on the -hackers list a while back, but didn't get any response from any of the knowledgeable hackers. Are you thinking of coding this, or just suggesting it for others? I was thinking of coding something like this but found that I didn't understand enough of the internals of how the vacuum command actually works to be able to write this. I'd be willing to devote perhaps a few hours a week to it if you want to help me. Regards, Paul Tillotson ----------------- P. S. The last time I thought about it, I decided that the best solution is probably one that works just like vacuum full except that it scans the table in reverse order. It would do something like this: - Wait for exclusive lock. - Start at the end of the table -- call this page I. - If page I is completely empty, shrink the heap and go to step 1 again. (Page I is not empty now.) - *Scan forward in the table until you find a page that is empty. (Call it J) If no such page is found, there is no more free space in the table. Exit. - Move the tuples from page I to page J. - Drop the exclusive lock. and go to step one. *On subsequent iterations of the loop, do not reset J. i.e., start scanning at the last place that free space was known to exist. Presumably no useful amount of free space will get created in the table while this algorithm is running.
On Wed, Apr 20, 2005 at 07:33:54PM -0400, Tom Lane wrote: > "Jim C. Nasby" <decibel@decibel.org> writes: > > In a nutshell, my idea is to use the normal transactional/XID code to > > relocate tuples in the heap. Think of doing an UPDATE field=field if you > > could tell update what page to put the new tuple on. Using this > > mechanism, you can move tuples from the end of the heap to pages that > > have free space on them. The dead tuples at the end of the heap could > > then be vacuumed conventionally, and completely empty pages removed by > > that vacuum. > > How exactly is this different from what happens now, assuming that you > didn't run out of FSM? In the case of cluster I think it's quite different, as cluster currently re-writes the heap from scratch, no? I'm not sure how different it is from vacuum full, though the main idea is that instead of locking the table you instead work in smaller pieces and don't block anything other than other updates. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Wed, Apr 20, 2005 at 08:10:23PM -0400, Paul Tillotson wrote: > Jim C. Nasby wrote: > > >I talked to a few people on IRC about this and they didn't think I was > >nuts, so maybe this is something practical... > > > >In a nutshell, my idea is to use the normal transactional/XID code to > >relocate tuples in the heap. Think of doing an UPDATE field=field if you > >could tell update what page to put the new tuple on. > > > Be careful not to fire UPDATE triggers on the tuple while doing so. To clarify, I'm not suggesting this actually be coded as a bunch of updates. I used that as an example only. > Keep in mind that the transaction that does the update can't also vacuum > it's own tuples. You'd have to end one transaction, then wait until > every transaction running while the updater ran finishes, then start the > transaction that vacuums. Obviously your command would need to be able > to start and end transactions. (Meaning that it can't be a user defined > function, and it probably can't be a normal self-contained command in > postgres.) Yes. Note how I called it a background vacuum full/cluster. > Are you thinking of coding this, or just suggesting it for others? I > was thinking of coding something like this but found that I didn't > understand enough of the internals of how the vacuum command actually > works to be able to write this. I'd be willing to devote perhaps a few > hours a week to it if you want to help me. I certainly don't have enough knowledge right now to code this, but I'd be willing to help any way I can. > P. S. > > The last time I thought about it, I decided that the best solution is > probably one that works just like vacuum full except that it scans the > table in reverse order. It would do something like this: > > - Wait for exclusive lock. That's exactly what I want to avoid. The reality of cluster and vacuum full is that many (if not most) installs can't use them because of how they disrupt the system. I'd like a version that doesn't do that. > - Start at the end of the table -- call this page I. > - If page I is completely empty, shrink the heap and go to step 1 again. > (Page I is not empty now.) > - *Scan forward in the table until you find a page that is empty. (Call > it J) > If no such page is found, there is no more free space in the > table. Exit. > - Move the tuples from page I to page J. > - Drop the exclusive lock. and go to step one. Same basic idea. I haven't gone into specific details because I want to see how feasable it is. And since I can't code it myself the best I can hope for is a TODO; and IMO I shouldn't try and tell whoever takes that TODO how exactly to make this work. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > I'm not sure how different it is from vacuum full, though the main idea > is that instead of locking the table you instead work in smaller pieces > and don't block anything other than other updates. We don't have any support for locking sections of a table larger than a page, so I'm not clear on how the above could be made to work. But in any case, I wasn't talking about vacuum full. I was thinking of the total picture in a normal vacuum cycle: 1. vacuum cleans out dead tuples and records the space in FSM 2. ordinary inserts and updates use the space shown inFSM 3. next vacuum cleans out the space freed, and shortens the table if it can I believe that step 2 preferentially uses space closer to the front of the table, so I think that what you are proposing already happens naturally. regards, tom lane
Tom Lane wrote: >"Jim C. Nasby" <decibel@decibel.org> writes: > > >>I'm not sure how different it is from vacuum full, though the main idea >>is that instead of locking the table you instead work in smaller pieces >>and don't block anything other than other updates. >> >> > >We don't have any support for locking sections of a table larger than >a page, so I'm not clear on how the above could be made to work. > >But in any case, I wasn't talking about vacuum full. I was thinking of >the total picture in a normal vacuum cycle: > > 1. vacuum cleans out dead tuples and records the space in FSM > 2. ordinary inserts and updates use the space shown in FSM > 3. next vacuum cleans out the space freed, and shortens the table > if it can > >I believe that step 2 preferentially uses space closer to the front >of the table, so I think that what you are proposing already happens >naturally. > > > So if we attempted this, then the missing piece of the puzzle is 2b. move the tuples out of the end pages of the tableusing UPDATE, or using a "special" update that changes no values and fires no triggers either. Right? Two questions: 1) Does an update always go to the FSM to find out where to put the new tuple, or does it first try to put it in the current page, and only read the FSM if the current page is already full? 2) Is it possible to write a where clause that can efficiently hit only the tuples in the end of the table? If there is a way, then I could test the idea without writing any code at all. Regards, Paul Tillotson
Jim C. Nasby wrote: >On Wed, Apr 20, 2005 at 08:10:23PM -0400, Paul Tillotson wrote: > > >>P. S. >> >>The last time I thought about it, I decided that the best solution is >>probably one that works just like vacuum full except that it scans the >>table in reverse order. It would do something like this: >> >>- Wait for exclusive lock. >> >> >That's exactly what I want to avoid. The reality of cluster and vacuum >full is that many (if not most) installs can't use them because of how >they disrupt the system. I'd like a version that doesn't do that. > > > The version I outlined releases its exclusive lock every time it successfully moves all the tuples out of a page. This means that it will only hold one long enough to find free space for the tuples in the page that it is currently trying to clear, which should not take long if the table is bloated. After that, it releases it, and then every transaction waiting for that lock gets to go again before it takes an exclusive lock. On a lightly loaded system, this should be unnoticeable. The use-case which I was targeting is when you are trying to shrink a table that is being used for a web application--a wait of 1 second is ok, but wait of 5 minutes isn't. >>- Start at the end of the table -- call this page I. >>- If page I is completely empty, shrink the heap and go to step 1 again. >> (Page I is not empty now.) >>- *Scan forward in the table until you find a page that is empty. (Call >>it J) >> If no such page is found, there is no more free space in the >>table. Exit. >>- Move the tuples from page I to page J. >>- Drop the exclusive lock. and go to step one. >> >> > >Same basic idea. I haven't gone into specific details because I want to >see how feasable it is. And since I can't code it myself the best I can >hope for is a TODO; and IMO I shouldn't try and tell whoever takes that >TODO how exactly to make this work. > > Regards, Paul Tillotson
On Thu, Apr 21, 2005 at 08:53:33PM -0400, Paul Tillotson wrote: > 1) Does an update always go to the FSM to find out where to put the new > tuple, or does it first try to put it in the current page, and only read > the FSM if the current page is already full? The latter. > 2) Is it possible to write a where clause that can efficiently hit only > the tuples in the end of the table? If there is a way, then I could > test the idea without writing any code at all. Not sure, but you could try using the ctid column. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
> > 2) Is it possible to write a where clause that can efficiently hit only > > the tuples in the end of the table? If there is a way, then I could > > test the idea without writing any code at all. > > Not sure, but you could try using the ctid column. > An alternative is to do a bulky ordered insert. Build an index on the table. And write a where clause which just visit the big-value part of the table. Regards, Qingqing
All the issues brought up are why I'm not in favor of trying to do this outside of the backend. On Fri, Apr 22, 2005 at 11:29:27AM +0800, Qingqing Zhou wrote: > > > > 2) Is it possible to write a where clause that can efficiently hit only > > > the tuples in the end of the table? If there is a way, then I could > > > test the idea without writing any code at all. > > > > Not sure, but you could try using the ctid column. > > > > An alternative is to do a bulky ordered insert. Build an index on the table. > And write a where clause which just visit the big-value part of the table. > > Regards, > Qingqing > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Thu, Apr 21, 2005 at 01:06:13AM -0400, Tom Lane wrote: > "Jim C. Nasby" <decibel@decibel.org> writes: > > I'm not sure how different it is from vacuum full, though the main idea > > is that instead of locking the table you instead work in smaller pieces > > and don't block anything other than other updates. > > We don't have any support for locking sections of a table larger than > a page, so I'm not clear on how the above could be made to work. Vacuum full and cluster both lock the entire table, no? > But in any case, I wasn't talking about vacuum full. I was thinking of > the total picture in a normal vacuum cycle: > > 1. vacuum cleans out dead tuples and records the space in FSM > 2. ordinary inserts and updates use the space shown in FSM > 3. next vacuum cleans out the space freed, and shortens the table > if it can > > I believe that step 2 preferentially uses space closer to the front > of the table, so I think that what you are proposing already happens > naturally. Don't updates prefer putting the new tuple on the same page if there's room available? If so, 2 doesn't really happen nearly as much as it could. Also, my point was to proactively move tuples, instead of hoping that they eventually get moved by an update. Your suggestion also doesn't allow for a background/non-blocking cluster, unless some code is written that prefferentially puts new tuples in cluster order. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"