Re: Minimally avoiding Transaction Wraparound in VLDBs - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Minimally avoiding Transaction Wraparound in VLDBs |
Date | |
Msg-id | 1125590999.3956.215.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Minimally avoiding Transaction Wraparound in VLDBs (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Minimally avoiding Transaction Wraparound in VLDBs
|
List | pgsql-hackers |
On Thu, 2005-09-01 at 10:29 +0100, Simon Riggs wrote: > On Wed, 2005-08-31 at 22:21 -0400, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > On Wed, 2005-08-31 at 19:24 -0400, Tom Lane wrote: > > >> If you don't remove any tuples, > > >> you don't scan the indexes anyway IIRC. > > > > > No. Even if you remove *zero* tuples, an index is still scanned twice. > > > Once to not delete the rows and once to not delete the pages. > > > > Yeah? Well, that could probably be improved with a less intrusive fix, > > that is, one that does it automatically instead of involving the user. > > > > I really really do not like proposals to introduce still another kind > > of VACUUM. We have too many already; any casual glance through the > > archives will show that most PG users don't have a grip on when to use > > VACUUM FULL vs VACUUM. Throwing in some more types will make that > > problem exponentially worse. > > I'll post my proposal for changing that, so we can see the two > alternatives. I'm easy either way at the moment. Currently, VACUUM will always scan each index twice, even if there are no tuples to remove from the index. Once in index_bulk_delete() and once in index_vacuum_cleanup() (at least for the nbtree and rtree AMs). My first proposal is to add an extra parameter onto the index_bulk_delete() call - ntuples. If ntuples == 0 then btbulkdelete() will avoid scanning and return immediately. If a scan occurs, then we keep track of how many tuples have been marked deleted and stop the scan when we have reached this number. This approach means that two use cases will be optimised: 1) where the VACUUM does not remove any tuples 2) where all the deletions are on the LHS of the index, as would be the case in a frequently updated table like Shipments where an index had either a monotonically increasing key or a time based key. Optimising index_vacuum_cleanup() is more complex. btvacuumcleanup() marks pages deleted OR adds pages already deleted onto the freelist. Non-FULL VACUUM never does both at the same time for MVCC reasons, so it takes two VACUUMs to actually move a page back onto the freelist. So performing a scan of the index during index_vacuum_cleanup() does actually have a purpose when there are no tuples deleted during the vacuum because it might be the subsequent VACUUM coming back later to freelist the deleted pages. (This is interesting, because autovacuum knows nothing of the deleted pages and may not trigger a second vacuum, even though one would be beneficial). index_vacuum_cleanup() knows how many rows are to be removed because it is already part of the info structure passed between index_bulk_delete() and index_vacuum_cleanup(). There are two options: 1) avoid the scan when there are no dead tuples, and ignore the possibility that a VACUUM might be doing a follow-up scan to remove previously deleted tuples. 2) close the loophole by recording whether a previous VACUUM had set any pages as deleted, or not. If the index was "clean", then we would skip the scan. 1) seems the best option since 2) is a lot of work for less gain. If all of that seems OK, I'll code a patch. Best Regards, Simon Riggs
pgsql-hackers by date: