Thread: VACUUM optimization ideas.

VACUUM optimization ideas.

From
Alfred Perlstein
Date:
Here's two ideas I had for optimizing vacuum, I apologize in advance
if the ideas presented here are niave and don't take into account
the actual code that makes up postgresql.

================

#1

Reducing the time vacuum must hold an exlusive lock on a table:

The idea is that since rows are marked deleted it's ok for the
vacuum to fill them with data from the tail of the table as
long as no transaction is in progress that has started before
the row was deleted.

This may allow the vacuum process to copyback all the data without
a lock, when all the copying is done it then aquires an exlusive lock
and does this:

Aquire an exclusive lock.
Walk all the deleted data marking it as current.
Truncate the table.
Release the lock.

Since the data is still marked invalid (right?) even if valid data
is copied into the space it should be ignored as long as there's no
transaction occurring that started before the data was invalidated.

================

#2

Reducing the amount of scanning a vaccum must do:

It would make sense that if a value of the earliest deleted chunk
was kept in a table then vacuum would not have to scan the entire
table in order to work, it would only need to start at the 'earliest'
invalidated row.

The utility of this (at least for us) is that we have several tables
that will grow to hundreds of megabytes, however changes will only
happen at the tail end (recently added rows).  If we could reduce the
amount of time spent in a vacuum state it would help us a lot.

================

I'm wondering if these ideas make sense and may help at all.

thanks,
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


Re: VACUUM optimization ideas.

From
hstenger@adinet.com.uy
Date:
Alfred Perlstein wrote:
> #1
> 
> Reducing the time vacuum must hold an exlusive lock on a table:
> 
> The idea is that since rows are marked deleted it's ok for the
> vacuum to fill them with data from the tail of the table as
> long as no transaction is in progress that has started before
> the row was deleted.
> 
> This may allow the vacuum process to copyback all the data without
> a lock, when all the copying is done it then aquires an exlusive lock
> and does this:
> 
> Aquire an exclusive lock.
> Walk all the deleted data marking it as current.
> Truncate the table.
> Release the lock.
> 
> Since the data is still marked invalid (right?) even if valid data
> is copied into the space it should be ignored as long as there's no
> transaction occurring that started before the data was invalidated.

Yes, but nothing prevents newer transactions from modifying the _origin_ side of
the copied data _after_ it was copied, but before the Lock-Walk-Truncate-Unlock
cycle takes place, and so it seems unsafe. Maybe locking each record before
copying it up ...

Regards,
Haroldo.

-- 
----------------------+------------------------Haroldo Stenger      | hstenger@ieee.orgMontevideo, Uruguay. |
hstenger@adinet.com.uy
----------------------+------------------------Visit UYLUG Web Site: http://www.linux.org.uy
-----------------------------------------------


Re: VACUUM optimization ideas.

From
Sevo Stille
Date:
Alfred Perlstein wrote:

> The idea is that since rows are marked deleted it's ok for the
> vacuum to fill them with data from the tail of the table as
> long as no transaction is in progress that has started before
> the row was deleted.

Well, isn't one of the advantages of vacuuming in the reordering it
does? With a "fill deleted chunks" logic, we'd have far less order in
the databases. 

> This may allow the vacuum process to copyback all the data without
> a lock, 

Nope. Another process might update the values in between move and mark,
if the record is not locked. We'd either have to write-lock the entire
table for that period, write lock every item as it is moved, or lock,
move and mark on a per-record base. The latter would be slow, but it
could be done in a permanent low priority background process, utilizing
empty CPU cycles. Besides, it probably could not only be done simply
filling from the tail, but also moving up the records in a sorted
fashion. 

> #2
> 
> Reducing the amount of scanning a vaccum must do:
> 
> It would make sense that if a value of the earliest deleted chunk
> was kept in a table then vacuum would not have to scan the entire
> table in order to work, it would only need to start at the 'earliest'
> invalidated row.

Trivial to do. But of course #1 may imply that the physical ordering is
even less likely to be related to the logical ordering in a way where
this helps. 

> The utility of this (at least for us) is that we have several tables
> that will grow to hundreds of megabytes, however changes will only
> happen at the tail end (recently added rows). 

The tail is a relative position - except for the case where you add
temporary records to a constant default set, everything in the tail will
move, at least relatively, to the head after some time.

> If we could reduce the
> amount of time spent in a vacuum state it would help us a lot.

Rather: If we can reduce the time spent in a locked state while
vacuuming, it would help a lot. Being in a vacuum is not the issue -
even permanent vacuuming need not be an issue, if the locks it uses are
suitably  short-time. 

Sevo

-- 
sevo@ip23.net


Re: VACUUM optimization ideas.

From
"David Lloyd-Jones"
Date:
"Alfred Perlstein" <bright@wintelcom.net>
> Here's two ideas I had for optimizing vacuum, I apologize in advance
> if the ideas presented here are niave and don't take into account
> the actual code that makes up postgresql.
                                                    * * *

This is the fist time I have dared to file in the exalted realm of
[HACKERS]. On the other hand I wrote a memo to Bill Gates a couple of years
ago which apparently resulted in C#, which is really worth a little bit of
attention, given the number of VB writers out there. I'm not quite as stupid
as I look.

Why doesn't `vacuum' happen all the time, instantly?

Like, does everybody feel psychologically more secure if a "commit" is not
really a commit, it's there for some Emergency Refind to find?

(If there are olde hardware reasons, or software -- "Well, uh, back at
BBN..." -- type reasons, I'd be happy to hear them.)

Screw it. "Is that your final answer?" is your final answer. Commit and
rebuild; optimize memory use all the time in the spare milliseconds; no
human is needed to make obvious calls.
                                                    -dlj.









Re: VACUUM optimization ideas.

From
Bruce Momjian
Date:
> Alfred Perlstein wrote:
> > #1
> > 
> > Reducing the time vacuum must hold an exlusive lock on a table:
> > 
> > The idea is that since rows are marked deleted it's ok for the
> > vacuum to fill them with data from the tail of the table as
> > long as no transaction is in progress that has started before
> > the row was deleted.
> > 
> > This may allow the vacuum process to copyback all the data without
> > a lock, when all the copying is done it then aquires an exlusive lock
> > and does this:
> > 
> > Aquire an exclusive lock.
> > Walk all the deleted data marking it as current.
> > Truncate the table.
> > Release the lock.
> > 
> > Since the data is still marked invalid (right?) even if valid data
> > is copied into the space it should be ignored as long as there's no
> > transaction occurring that started before the data was invalidated.
> 
> Yes, but nothing prevents newer transactions from modifying the _origin_ side of
> the copied data _after_ it was copied, but before the Lock-Walk-Truncate-Unlock
> cycle takes place, and so it seems unsafe. Maybe locking each record before
> copying it up ...

Seems a read-lock would be necessary during the moving, but still a win.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: VACUUM optimization ideas.

From
Bruce Momjian
Date:
> #2
> 
> Reducing the amount of scanning a vaccum must do:
> 
> It would make sense that if a value of the earliest deleted chunk
> was kept in a table then vacuum would not have to scan the entire
> table in order to work, it would only need to start at the 'earliest'
> invalidated row.
> 
> The utility of this (at least for us) is that we have several tables
> that will grow to hundreds of megabytes, however changes will only
> happen at the tail end (recently added rows).  If we could reduce the
> amount of time spent in a vacuum state it would help us a lot.

But you have to update that every time a row is modified.  Seems a
sequential scan by vacuum is fast enough.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: VACUUM optimization ideas.

From
Bruce Momjian
Date:
> Here's two ideas I had for optimizing vacuum, I apologize in advance
> if the ideas presented here are niave and don't take into account
> the actual code that makes up postgresql.
> 
> ================
> 
> #1
> 
> Reducing the time vacuum must hold an exlusive lock on a table:
> 
> The idea is that since rows are marked deleted it's ok for the
> vacuum to fill them with data from the tail of the table as
> long as no transaction is in progress that has started before
> the row was deleted.
> 
> This may allow the vacuum process to copyback all the data without
> a lock, when all the copying is done it then aquires an exlusive lock
> and does this:
> 
> Aquire an exclusive lock.
> Walk all the deleted data marking it as current.
> Truncate the table.
> Release the lock.
> 
> Since the data is still marked invalid (right?) even if valid data
> is copied into the space it should be ignored as long as there's no
> transaction occurring that started before the data was invalidated.

Added to TODO:

* Reduce VACUUM lock time by moving tuples with read lock, then write  lock and truncate table [vacuum]  

The read-lock is required because other transactions must be prevented
from modifying the rows, and the index is also an issue.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026