Re: Proposal for background vacuum full/cluster - Mailing list pgsql-hackers

From Paul Tillotson
Subject Re: Proposal for background vacuum full/cluster
Date
Msg-id 4266EF6F.9020505@shentel.net
Whole thread Raw
In response to Proposal for background vacuum full/cluster  ("Jim C. Nasby" <decibel@decibel.org>)
Responses Re: Proposal for background vacuum full/cluster  ("Jim C. Nasby" <decibel@decibel.org>)
List pgsql-hackers
Jim C. Nasby wrote:

>I talked to a few people on IRC about this and they didn't think I was
>nuts, so maybe this is something practical...
>
>In a nutshell, my idea is to use the normal transactional/XID code to
>relocate tuples in the heap. Think of doing an UPDATE field=field if you
>could tell update what page to put the new tuple on. 
>
Be careful not to fire UPDATE triggers on the tuple while doing so.

>Of course, it's not quite that simple. For starters, you'd want to do a
>conventional vacuum before this, both to free as much space as possible
>and to update the FSM. It might also be necessary to prevent backends
>from using the pages at the end of the heap (which you're trying to
>empty). I'm guessing that could be done just by removing the pages from
>the FSM. You'd also need to vacuum after emptying these pages to reclaim
>the disk space. To facilitate these things, it might be useful to be
>able to vacuum parts of the heap. So as pages are emptied at the end of
>the heap, they can be vacuumed and reclaimed while the pages are still
>probably in cache (and without requiring a re-vacuum of the entire
>table).
>
>  
>
Keep in mind that the transaction that does the update can't also vacuum 
it's own tuples.  You'd have to end one transaction, then wait until 
every transaction running while the updater ran finishes, then start the 
transaction that vacuums.  Obviously your command would need to be able 
to start and end transactions.  (Meaning that it can't be a user defined 
function, and it probably can't be a normal self-contained command in 
postgres.)

>Taking this technique one step further, it should also be possible to
>cluster in the background without blocking everything. One way to do
>this would be to empty the first page in the heap by moving it's tuples
>elsewhere, and vacuuming that page (but not putting it in the FSM). Once
>that page is available, you can start reading in from the clustering
>index and moving those tuples to the first page.
>
>One thing that might be an issue for both ideas is index bloat. But
>since reindex is a non-blocking operation, it doesn't seem unreasonable
>to either do that automatically or have the user do it.
>
>Is this TODOable?
>  
>
I asked about something like this on the -hackers list a while back, but 
didn't get any response from any of the knowledgeable hackers. 

Are you thinking of coding this, or just suggesting it for others?  I 
was thinking of coding something like this but found that I didn't 
understand enough of the internals of how the vacuum command actually 
works to be able to write this.  I'd be willing to devote perhaps a few 
hours a week to it if you want to help me.

Regards,

Paul Tillotson

-----------------

P. S.

The last time I thought about it, I decided that the best solution is 
probably one that works just like vacuum full except that it scans the 
table in reverse order.  It would do something like this:

- Wait for exclusive lock.
- Start at the end of the table -- call this page I.
- If page I is completely empty, shrink the heap and go to step 1 again.        (Page I is not empty now.)
- *Scan forward in the table until you find a page that is empty.  (Call 
it J)        If no such page is found, there is no more free space in the 
table.  Exit.
- Move the tuples from page I to page J.
- Drop the exclusive lock. and go to step one. 

*On subsequent iterations of the loop, do not reset J.  i.e., start 
scanning at the last place that free space was known to exist.  
Presumably no useful amount of free space will get created in the table 
while this algorithm is running.



pgsql-hackers by date:

Previous
From: Gavin Sherry
Date:
Subject: Re: WAL/PITR additional items
Next
From: "David F. Skoll"
Date:
Subject: Re: Postgres: pg_hba.conf, md5, pg_shadow, encrypted passwords