Thread: VACUUM optimization ideas.
Here's two ideas I had for optimizing vacuum, I apologize in advance if the ideas presented here are niave and don't take into account the actual code that makes up postgresql. ================ #1 Reducing the time vacuum must hold an exlusive lock on a table: The idea is that since rows are marked deleted it's ok for the vacuum to fill them with data from the tail of the table as long as no transaction is in progress that has started before the row was deleted. This may allow the vacuum process to copyback all the data without a lock, when all the copying is done it then aquires an exlusive lock and does this: Aquire an exclusive lock. Walk all the deleted data marking it as current. Truncate the table. Release the lock. Since the data is still marked invalid (right?) even if valid data is copied into the space it should be ignored as long as there's no transaction occurring that started before the data was invalidated. ================ #2 Reducing the amount of scanning a vaccum must do: It would make sense that if a value of the earliest deleted chunk was kept in a table then vacuum would not have to scan the entire table in order to work, it would only need to start at the 'earliest' invalidated row. The utility of this (at least for us) is that we have several tables that will grow to hundreds of megabytes, however changes will only happen at the tail end (recently added rows). If we could reduce the amount of time spent in a vacuum state it would help us a lot. ================ I'm wondering if these ideas make sense and may help at all. thanks, -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
Alfred Perlstein wrote: > #1 > > Reducing the time vacuum must hold an exlusive lock on a table: > > The idea is that since rows are marked deleted it's ok for the > vacuum to fill them with data from the tail of the table as > long as no transaction is in progress that has started before > the row was deleted. > > This may allow the vacuum process to copyback all the data without > a lock, when all the copying is done it then aquires an exlusive lock > and does this: > > Aquire an exclusive lock. > Walk all the deleted data marking it as current. > Truncate the table. > Release the lock. > > Since the data is still marked invalid (right?) even if valid data > is copied into the space it should be ignored as long as there's no > transaction occurring that started before the data was invalidated. Yes, but nothing prevents newer transactions from modifying the _origin_ side of the copied data _after_ it was copied, but before the Lock-Walk-Truncate-Unlock cycle takes place, and so it seems unsafe. Maybe locking each record before copying it up ... Regards, Haroldo. -- ----------------------+------------------------Haroldo Stenger | hstenger@ieee.orgMontevideo, Uruguay. | hstenger@adinet.com.uy ----------------------+------------------------Visit UYLUG Web Site: http://www.linux.org.uy -----------------------------------------------
Alfred Perlstein wrote: > The idea is that since rows are marked deleted it's ok for the > vacuum to fill them with data from the tail of the table as > long as no transaction is in progress that has started before > the row was deleted. Well, isn't one of the advantages of vacuuming in the reordering it does? With a "fill deleted chunks" logic, we'd have far less order in the databases. > This may allow the vacuum process to copyback all the data without > a lock, Nope. Another process might update the values in between move and mark, if the record is not locked. We'd either have to write-lock the entire table for that period, write lock every item as it is moved, or lock, move and mark on a per-record base. The latter would be slow, but it could be done in a permanent low priority background process, utilizing empty CPU cycles. Besides, it probably could not only be done simply filling from the tail, but also moving up the records in a sorted fashion. > #2 > > Reducing the amount of scanning a vaccum must do: > > It would make sense that if a value of the earliest deleted chunk > was kept in a table then vacuum would not have to scan the entire > table in order to work, it would only need to start at the 'earliest' > invalidated row. Trivial to do. But of course #1 may imply that the physical ordering is even less likely to be related to the logical ordering in a way where this helps. > The utility of this (at least for us) is that we have several tables > that will grow to hundreds of megabytes, however changes will only > happen at the tail end (recently added rows). The tail is a relative position - except for the case where you add temporary records to a constant default set, everything in the tail will move, at least relatively, to the head after some time. > If we could reduce the > amount of time spent in a vacuum state it would help us a lot. Rather: If we can reduce the time spent in a locked state while vacuuming, it would help a lot. Being in a vacuum is not the issue - even permanent vacuuming need not be an issue, if the locks it uses are suitably short-time. Sevo -- sevo@ip23.net
"Alfred Perlstein" <bright@wintelcom.net> > Here's two ideas I had for optimizing vacuum, I apologize in advance > if the ideas presented here are niave and don't take into account > the actual code that makes up postgresql. * * * This is the fist time I have dared to file in the exalted realm of [HACKERS]. On the other hand I wrote a memo to Bill Gates a couple of years ago which apparently resulted in C#, which is really worth a little bit of attention, given the number of VB writers out there. I'm not quite as stupid as I look. Why doesn't `vacuum' happen all the time, instantly? Like, does everybody feel psychologically more secure if a "commit" is not really a commit, it's there for some Emergency Refind to find? (If there are olde hardware reasons, or software -- "Well, uh, back at BBN..." -- type reasons, I'd be happy to hear them.) Screw it. "Is that your final answer?" is your final answer. Commit and rebuild; optimize memory use all the time in the spare milliseconds; no human is needed to make obvious calls. -dlj.
> Alfred Perlstein wrote: > > #1 > > > > Reducing the time vacuum must hold an exlusive lock on a table: > > > > The idea is that since rows are marked deleted it's ok for the > > vacuum to fill them with data from the tail of the table as > > long as no transaction is in progress that has started before > > the row was deleted. > > > > This may allow the vacuum process to copyback all the data without > > a lock, when all the copying is done it then aquires an exlusive lock > > and does this: > > > > Aquire an exclusive lock. > > Walk all the deleted data marking it as current. > > Truncate the table. > > Release the lock. > > > > Since the data is still marked invalid (right?) even if valid data > > is copied into the space it should be ignored as long as there's no > > transaction occurring that started before the data was invalidated. > > Yes, but nothing prevents newer transactions from modifying the _origin_ side of > the copied data _after_ it was copied, but before the Lock-Walk-Truncate-Unlock > cycle takes place, and so it seems unsafe. Maybe locking each record before > copying it up ... Seems a read-lock would be necessary during the moving, but still a win. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> #2 > > Reducing the amount of scanning a vaccum must do: > > It would make sense that if a value of the earliest deleted chunk > was kept in a table then vacuum would not have to scan the entire > table in order to work, it would only need to start at the 'earliest' > invalidated row. > > The utility of this (at least for us) is that we have several tables > that will grow to hundreds of megabytes, however changes will only > happen at the tail end (recently added rows). If we could reduce the > amount of time spent in a vacuum state it would help us a lot. But you have to update that every time a row is modified. Seems a sequential scan by vacuum is fast enough. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Here's two ideas I had for optimizing vacuum, I apologize in advance > if the ideas presented here are niave and don't take into account > the actual code that makes up postgresql. > > ================ > > #1 > > Reducing the time vacuum must hold an exlusive lock on a table: > > The idea is that since rows are marked deleted it's ok for the > vacuum to fill them with data from the tail of the table as > long as no transaction is in progress that has started before > the row was deleted. > > This may allow the vacuum process to copyback all the data without > a lock, when all the copying is done it then aquires an exlusive lock > and does this: > > Aquire an exclusive lock. > Walk all the deleted data marking it as current. > Truncate the table. > Release the lock. > > Since the data is still marked invalid (right?) even if valid data > is copied into the space it should be ignored as long as there's no > transaction occurring that started before the data was invalidated. Added to TODO: * Reduce VACUUM lock time by moving tuples with read lock, then write lock and truncate table [vacuum] The read-lock is required because other transactions must be prevented from modifying the rows, and the index is also an issue. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026