Thread: Proposal for background vacuum full/cluster

Proposal for background vacuum full/cluster

From
"Jim C. Nasby"
Date:
I talked to a few people on IRC about this and they didn't think I was
nuts, so maybe this is something practical...

In a nutshell, my idea is to use the normal transactional/XID code to
relocate tuples in the heap. Think of doing an UPDATE field=field if you
could tell update what page to put the new tuple on. Using this
mechanism, you can move tuples from the end of the heap to pages that
have free space on them. The dead tuples at the end of the heap could
then be vacuumed conventionally, and completely empty pages removed by
that vacuum.

Of course, it's not quite that simple. For starters, you'd want to do a
conventional vacuum before this, both to free as much space as possible
and to update the FSM. It might also be necessary to prevent backends
from using the pages at the end of the heap (which you're trying to
empty). I'm guessing that could be done just by removing the pages from
the FSM. You'd also need to vacuum after emptying these pages to reclaim
the disk space. To facilitate these things, it might be useful to be
able to vacuum parts of the heap. So as pages are emptied at the end of
the heap, they can be vacuumed and reclaimed while the pages are still
probably in cache (and without requiring a re-vacuum of the entire
table).

Taking this technique one step further, it should also be possible to
cluster in the background without blocking everything. One way to do
this would be to empty the first page in the heap by moving it's tuples
elsewhere, and vacuuming that page (but not putting it in the FSM). Once
that page is available, you can start reading in from the clustering
index and moving those tuples to the first page.

One thing that might be an issue for both ideas is index bloat. But
since reindex is a non-blocking operation, it doesn't seem unreasonable
to either do that automatically or have the user do it.

Is this TODOable?
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Proposal for background vacuum full/cluster

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> In a nutshell, my idea is to use the normal transactional/XID code to
> relocate tuples in the heap. Think of doing an UPDATE field=field if you
> could tell update what page to put the new tuple on. Using this
> mechanism, you can move tuples from the end of the heap to pages that
> have free space on them. The dead tuples at the end of the heap could
> then be vacuumed conventionally, and completely empty pages removed by
> that vacuum.

How exactly is this different from what happens now, assuming that you
didn't run out of FSM?
        regards, tom lane


Re: Proposal for background vacuum full/cluster

From
Paul Tillotson
Date:
Jim C. Nasby wrote:

>I talked to a few people on IRC about this and they didn't think I was
>nuts, so maybe this is something practical...
>
>In a nutshell, my idea is to use the normal transactional/XID code to
>relocate tuples in the heap. Think of doing an UPDATE field=field if you
>could tell update what page to put the new tuple on. 
>
Be careful not to fire UPDATE triggers on the tuple while doing so.

>Of course, it's not quite that simple. For starters, you'd want to do a
>conventional vacuum before this, both to free as much space as possible
>and to update the FSM. It might also be necessary to prevent backends
>from using the pages at the end of the heap (which you're trying to
>empty). I'm guessing that could be done just by removing the pages from
>the FSM. You'd also need to vacuum after emptying these pages to reclaim
>the disk space. To facilitate these things, it might be useful to be
>able to vacuum parts of the heap. So as pages are emptied at the end of
>the heap, they can be vacuumed and reclaimed while the pages are still
>probably in cache (and without requiring a re-vacuum of the entire
>table).
>
>  
>
Keep in mind that the transaction that does the update can't also vacuum 
it's own tuples.  You'd have to end one transaction, then wait until 
every transaction running while the updater ran finishes, then start the 
transaction that vacuums.  Obviously your command would need to be able 
to start and end transactions.  (Meaning that it can't be a user defined 
function, and it probably can't be a normal self-contained command in 
postgres.)

>Taking this technique one step further, it should also be possible to
>cluster in the background without blocking everything. One way to do
>this would be to empty the first page in the heap by moving it's tuples
>elsewhere, and vacuuming that page (but not putting it in the FSM). Once
>that page is available, you can start reading in from the clustering
>index and moving those tuples to the first page.
>
>One thing that might be an issue for both ideas is index bloat. But
>since reindex is a non-blocking operation, it doesn't seem unreasonable
>to either do that automatically or have the user do it.
>
>Is this TODOable?
>  
>
I asked about something like this on the -hackers list a while back, but 
didn't get any response from any of the knowledgeable hackers. 

Are you thinking of coding this, or just suggesting it for others?  I 
was thinking of coding something like this but found that I didn't 
understand enough of the internals of how the vacuum command actually 
works to be able to write this.  I'd be willing to devote perhaps a few 
hours a week to it if you want to help me.

Regards,

Paul Tillotson

-----------------

P. S.

The last time I thought about it, I decided that the best solution is 
probably one that works just like vacuum full except that it scans the 
table in reverse order.  It would do something like this:

- Wait for exclusive lock.
- Start at the end of the table -- call this page I.
- If page I is completely empty, shrink the heap and go to step 1 again.        (Page I is not empty now.)
- *Scan forward in the table until you find a page that is empty.  (Call 
it J)        If no such page is found, there is no more free space in the 
table.  Exit.
- Move the tuples from page I to page J.
- Drop the exclusive lock. and go to step one. 

*On subsequent iterations of the loop, do not reset J.  i.e., start 
scanning at the last place that free space was known to exist.  
Presumably no useful amount of free space will get created in the table 
while this algorithm is running.



Re: Proposal for background vacuum full/cluster

From
"Jim C. Nasby"
Date:
On Wed, Apr 20, 2005 at 07:33:54PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > In a nutshell, my idea is to use the normal transactional/XID code to
> > relocate tuples in the heap. Think of doing an UPDATE field=field if you
> > could tell update what page to put the new tuple on. Using this
> > mechanism, you can move tuples from the end of the heap to pages that
> > have free space on them. The dead tuples at the end of the heap could
> > then be vacuumed conventionally, and completely empty pages removed by
> > that vacuum.
> 
> How exactly is this different from what happens now, assuming that you
> didn't run out of FSM?

In the case of cluster I think it's quite different, as cluster
currently re-writes the heap from scratch, no?

I'm not sure how different it is from vacuum full, though the main idea
is that instead of locking the table you instead work in smaller pieces
and don't block anything other than other updates.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Proposal for background vacuum full/cluster

From
"Jim C. Nasby"
Date:
On Wed, Apr 20, 2005 at 08:10:23PM -0400, Paul Tillotson wrote:
> Jim C. Nasby wrote:
> 
> >I talked to a few people on IRC about this and they didn't think I was
> >nuts, so maybe this is something practical...
> >
> >In a nutshell, my idea is to use the normal transactional/XID code to
> >relocate tuples in the heap. Think of doing an UPDATE field=field if you
> >could tell update what page to put the new tuple on. 
> >
> Be careful not to fire UPDATE triggers on the tuple while doing so.

To clarify, I'm not suggesting this actually be coded as a bunch of
updates. I used that as an example only.

> Keep in mind that the transaction that does the update can't also vacuum 
> it's own tuples.  You'd have to end one transaction, then wait until 
> every transaction running while the updater ran finishes, then start the 
> transaction that vacuums.  Obviously your command would need to be able 
> to start and end transactions.  (Meaning that it can't be a user defined 
> function, and it probably can't be a normal self-contained command in 
> postgres.)

Yes. Note how I called it a background vacuum full/cluster.

> Are you thinking of coding this, or just suggesting it for others?  I 
> was thinking of coding something like this but found that I didn't 
> understand enough of the internals of how the vacuum command actually 
> works to be able to write this.  I'd be willing to devote perhaps a few 
> hours a week to it if you want to help me.

I certainly don't have enough knowledge right now to code this, but I'd
be willing to help any way I can.

> P. S.
> 
> The last time I thought about it, I decided that the best solution is 
> probably one that works just like vacuum full except that it scans the 
> table in reverse order.  It would do something like this:
> 
> - Wait for exclusive lock.
That's exactly what I want to avoid. The reality of cluster and vacuum
full is that many (if not most) installs can't use them because of how
they disrupt the system. I'd like a version that doesn't do that.

> - Start at the end of the table -- call this page I.
> - If page I is completely empty, shrink the heap and go to step 1 again.
>         (Page I is not empty now.)
> - *Scan forward in the table until you find a page that is empty.  (Call 
> it J)
>         If no such page is found, there is no more free space in the 
> table.  Exit.
> - Move the tuples from page I to page J.
> - Drop the exclusive lock. and go to step one. 

Same basic idea. I haven't gone into specific details because I want to
see how feasable it is. And since I can't code it myself the best I can
hope for is a TODO; and IMO I shouldn't try and tell whoever takes that
TODO how exactly to make this work.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Proposal for background vacuum full/cluster

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> I'm not sure how different it is from vacuum full, though the main idea
> is that instead of locking the table you instead work in smaller pieces
> and don't block anything other than other updates.

We don't have any support for locking sections of a table larger than
a page, so I'm not clear on how the above could be made to work.

But in any case, I wasn't talking about vacuum full.  I was thinking of
the total picture in a normal vacuum cycle:
   1. vacuum cleans out dead tuples and records the space in FSM   2. ordinary inserts and updates use the space shown
inFSM   3. next vacuum cleans out the space freed, and shortens the table      if it can
 

I believe that step 2 preferentially uses space closer to the front
of the table, so I think that what you are proposing already happens
naturally.
        regards, tom lane


Re: Proposal for background vacuum full/cluster

From
Paul Tillotson
Date:
Tom Lane wrote:

>"Jim C. Nasby" <decibel@decibel.org> writes:
>  
>
>>I'm not sure how different it is from vacuum full, though the main idea
>>is that instead of locking the table you instead work in smaller pieces
>>and don't block anything other than other updates.
>>    
>>
>
>We don't have any support for locking sections of a table larger than
>a page, so I'm not clear on how the above could be made to work.
>
>But in any case, I wasn't talking about vacuum full.  I was thinking of
>the total picture in a normal vacuum cycle:
>
>    1. vacuum cleans out dead tuples and records the space in FSM
>    2. ordinary inserts and updates use the space shown in FSM
>    3. next vacuum cleans out the space freed, and shortens the table
>       if it can
>
>I believe that step 2 preferentially uses space closer to the front
>of the table, so I think that what you are proposing already happens
>naturally.
>
>  
>
So if we attempted this, then the missing piece of the puzzle is       2b. move the tuples out of the end pages of the
tableusing
 
UPDATE, or using a "special" update that changes no values and fires no
triggers either.

Right?

Two questions:

1) Does an update always go to the FSM to find out where to put the new
tuple, or does it first try to put it in the current page, and only read
the FSM if the current page is already full?

2) Is it possible to write a where clause that can efficiently hit only
the tuples in the end of the table?  If there is a way, then I could
test the idea without writing any code at all.

Regards,
Paul Tillotson





Re: Proposal for background vacuum full/cluster

From
Paul Tillotson
Date:
Jim C. Nasby wrote:

>On Wed, Apr 20, 2005 at 08:10:23PM -0400, Paul Tillotson wrote:
>  
>
>>P. S.
>>
>>The last time I thought about it, I decided that the best solution is 
>>probably one that works just like vacuum full except that it scans the 
>>table in reverse order.  It would do something like this:
>>
>>- Wait for exclusive lock.
>>    
>>
>That's exactly what I want to avoid. The reality of cluster and vacuum
>full is that many (if not most) installs can't use them because of how
>they disrupt the system. I'd like a version that doesn't do that.
>
>  
>
The version I outlined releases its exclusive lock every time it
successfully moves all the tuples out of a page.  This means that it
will only hold one long enough to find free space for the tuples in the
page that it is currently trying to clear, which should not take long if
the table is bloated.

After that, it releases it, and then every transaction waiting for that
lock gets to go again before it takes an exclusive lock.  On a lightly
loaded system, this should be unnoticeable.

The use-case which I was targeting is when you are trying to shrink a
table that is being used for a web application--a wait of 1 second is
ok, but wait of 5 minutes isn't.

>>- Start at the end of the table -- call this page I.
>>- If page I is completely empty, shrink the heap and go to step 1 again.
>>        (Page I is not empty now.)
>>- *Scan forward in the table until you find a page that is empty.  (Call 
>>it J)
>>        If no such page is found, there is no more free space in the 
>>table.  Exit.
>>- Move the tuples from page I to page J.
>>- Drop the exclusive lock. and go to step one. 
>>    
>>
>
>Same basic idea. I haven't gone into specific details because I want to
>see how feasable it is. And since I can't code it myself the best I can
>hope for is a TODO; and IMO I shouldn't try and tell whoever takes that
>TODO how exactly to make this work.
>  
>
Regards,
Paul Tillotson





Re: Proposal for background vacuum full/cluster

From
Alvaro Herrera
Date:
On Thu, Apr 21, 2005 at 08:53:33PM -0400, Paul Tillotson wrote:

> 1) Does an update always go to the FSM to find out where to put the new
> tuple, or does it first try to put it in the current page, and only read
> the FSM if the current page is already full?

The latter.

> 2) Is it possible to write a where clause that can efficiently hit only
> the tuples in the end of the table?  If there is a way, then I could
> test the idea without writing any code at all.

Not sure, but you could try using the ctid column.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)


Re: Proposal for background vacuum full/cluster

From
"Qingqing Zhou"
Date:
> > 2) Is it possible to write a where clause that can efficiently hit only
> > the tuples in the end of the table?  If there is a way, then I could
> > test the idea without writing any code at all.
>
> Not sure, but you could try using the ctid column.
>

An alternative is to do a bulky ordered insert. Build an index on the table.
And write a where clause which just visit the big-value part of the table.

Regards,
Qingqing




Re: Proposal for background vacuum full/cluster

From
"Jim C. Nasby"
Date:
All the issues brought up are why I'm not in favor of trying to do this
outside of the backend.

On Fri, Apr 22, 2005 at 11:29:27AM +0800, Qingqing Zhou wrote:
> 
> > > 2) Is it possible to write a where clause that can efficiently hit only
> > > the tuples in the end of the table?  If there is a way, then I could
> > > test the idea without writing any code at all.
> >
> > Not sure, but you could try using the ctid column.
> >
> 
> An alternative is to do a bulky ordered insert. Build an index on the table.
> And write a where clause which just visit the big-value part of the table.
> 
> Regards,
> Qingqing
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Proposal for background vacuum full/cluster

From
"Jim C. Nasby"
Date:
On Thu, Apr 21, 2005 at 01:06:13AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > I'm not sure how different it is from vacuum full, though the main idea
> > is that instead of locking the table you instead work in smaller pieces
> > and don't block anything other than other updates.
> 
> We don't have any support for locking sections of a table larger than
> a page, so I'm not clear on how the above could be made to work.

Vacuum full and cluster both lock the entire table, no?

> But in any case, I wasn't talking about vacuum full.  I was thinking of
> the total picture in a normal vacuum cycle:
> 
>     1. vacuum cleans out dead tuples and records the space in FSM
>     2. ordinary inserts and updates use the space shown in FSM
>     3. next vacuum cleans out the space freed, and shortens the table
>        if it can
> 
> I believe that step 2 preferentially uses space closer to the front
> of the table, so I think that what you are proposing already happens
> naturally.

Don't updates prefer putting the new tuple on the same page if there's
room available? If so, 2 doesn't really happen nearly as much as it
could. Also, my point was to proactively move tuples, instead of hoping
that they eventually get moved by an update.

Your suggestion also doesn't allow for a background/non-blocking
cluster, unless some code is written that prefferentially puts new
tuples in cluster order.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"