Thread: Real-Time Vacuum Possibility

Real-Time Vacuum Possibility

From
Rod Taylor
Date:
This is a much discussed topic, which always boils down to grabbing
indexes from disk.

At the moment we have bgwriter periodically flushing parts of the dirty
buffers to disk on a periodic basis. Would it be possible to have the
bgwriter take a look at the pages it has, and see if it can do any
vacuum work based on pages it is about to send to disk?

That is, quickly check table pages, prepare a list of tuples to be
cleaned, look at other pages its preparing to write and see if any of
them contain the index tuples for the data in the table pages?

When an update occurs which does not change the key values, the table
page and index pages all become dirty at once (assume key values, like
primary key, didn't change) so I would expect, without any knowledge in
the bgwriter algorithm, the bgwriter to push them all to disk within the
same batch most of the time.

It's a fairly limited case and by no means removes the requirement for
regular vacuums, but for an update heavy structure perhaps it would be
worth while? Even if it could only keep indexes clean it would help.

Just a thought..

--



Re: Real-Time Vacuum Possibility

From
Christopher Browne
Date:
The problem that persists with this is that it throws in extra
processing at the time that the system is the _most_ busy doing
updates, thereby worsening latency at times when the system may
already be reeling at the load.

I think, as a result, that VACUUM will _have_ to be done
asynchronously.

What strikes me as being a useful approach would be to set up an
LRU-ordered (or perhaps unordered) queue of pages that have had tuples
"killed off" by DELETE or UPDATE.

Thus, a DELETE/UPDATE would add the page the tuple is on to the list.

"VACUUM RECENT CHANGES" (or something of the sort) could walk through
just those pages.  Cleaning up indexes would require some further
reads, but that's a given.

This "architecture" would be way more supportive than the present way
vacuum works for tables which are large and which have portions that
see heavy update activity.
-- 
(format nil "~S@~S" "cbbrowne" "gmail.com")
http://linuxdatabases.info/info/lisp.html
Rules of the Evil Overlord  #129. "Despite the delicious irony, I will
not force two heroes to fight each other in the arena."
<http://www.eviloverlord.com/>


Re: Real-Time Vacuum Possibility

From
Christopher Browne
Date:
pg@rbt.ca (Rod Taylor) wrote:
> It's a fairly limited case and by no means removes the requirement for
> regular vacuums, but for an update heavy structure perhaps it would be
> worth while? Even if it could only keep indexes clean it would help.

The problem that persists with this is that it throws in extra
processing at the time that the system is the _most_ busy doing
updates, thereby worsening latency at times when the system may
already be reeling at the load.

I think, as a result, that VACUUM will _have_ to be done
asynchronously.

What strikes me as being a useful approach would be to set up an
LRU-ordered (or perhaps unordered) queue of pages that have had tuples
"killed off" by DELETE or UPDATE.

Thus, a DELETE/UPDATE would add the page the tuple is on to the list.

"VACUUM RECENT CHANGES" (or something of the sort) could walk through
just those pages.  Cleaning up indexes would require some further
reads, but that's a given.

This "architecture" would be way more supportive than the present way
vacuum works for tables which are large and which have portions that
see heavy update activity.
-- 
(format nil "~S@~S" "cbbrowne" "gmail.com")
http://linuxdatabases.info/info/lisp.html
Rules of the Evil Overlord  #129. "Despite the delicious irony, I will
not force two heroes to fight each other in the arena."
<http://www.eviloverlord.com/>


Re: Real-Time Vacuum Possibility

From
"Jim C. Nasby"
Date:
On Thu, Mar 17, 2005 at 05:22:41AM +0000, Christopher Browne wrote:
> What strikes me as being a useful approach would be to set up an
> LRU-ordered (or perhaps unordered) queue of pages that have had tuples
> "killed off" by DELETE or UPDATE.

From http://www.postgresql.org/docs/faqs.TODO.html :

* Maintain a map of recently-expired rows
 This allows vacuum to reclaim free space without requiring a sequential scan
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"