Re: Minimally avoiding Transaction Wraparound in VLDBs - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Minimally avoiding Transaction Wraparound in VLDBs
Date
Msg-id 1125590999.3956.215.camel@localhost.localdomain
Whole thread Raw
In response to Re: Minimally avoiding Transaction Wraparound in VLDBs  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Minimally avoiding Transaction Wraparound in VLDBs
List pgsql-hackers
On Thu, 2005-09-01 at 10:29 +0100, Simon Riggs wrote:
> On Wed, 2005-08-31 at 22:21 -0400, Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > On Wed, 2005-08-31 at 19:24 -0400, Tom Lane wrote:
> > >> If you don't remove any tuples,
> > >> you don't scan the indexes anyway IIRC.
> > 
> > > No. Even if you remove *zero* tuples, an index is still scanned twice.
> > > Once to not delete the rows and once to not delete the pages.
> > 
> > Yeah?  Well, that could probably be improved with a less intrusive fix,
> > that is, one that does it automatically instead of involving the user.
> > 
> > I really really do not like proposals to introduce still another kind
> > of VACUUM.  We have too many already; any casual glance through the
> > archives will show that most PG users don't have a grip on when to use
> > VACUUM FULL vs VACUUM.  Throwing in some more types will make that
> > problem exponentially worse.
> 
> I'll post my proposal for changing that, so we can see the two
> alternatives. I'm easy either way at the moment.

Currently, VACUUM will always scan each index twice, even if there are
no tuples to remove from the index. Once in index_bulk_delete() and once
in index_vacuum_cleanup() (at least for the nbtree and rtree AMs).

My first proposal is to add an extra parameter onto the
index_bulk_delete() call - ntuples. If ntuples == 0 then btbulkdelete()
will avoid scanning and return immediately. If a scan occurs, then we
keep track of how many tuples have been marked deleted and stop the scan
when we have reached this number. This approach means that two use cases
will be optimised:
1) where the VACUUM does not remove any tuples
2) where all the deletions are on the LHS of the index, as would be the
case in a frequently updated table like Shipments where an index had
either a monotonically increasing key or a time based key.

Optimising index_vacuum_cleanup() is more complex. btvacuumcleanup()
marks pages deleted OR adds pages already deleted onto the freelist.
Non-FULL VACUUM never does both at the same time for MVCC reasons, so it
takes two VACUUMs to actually move a page back onto the freelist. So
performing a scan of the index during index_vacuum_cleanup() does
actually have a purpose when there are no tuples deleted during the
vacuum because it might be the subsequent VACUUM coming back later to
freelist the deleted pages. (This is interesting, because autovacuum
knows nothing of the deleted pages and may not trigger a second vacuum,
even though one would be beneficial). index_vacuum_cleanup() knows how
many rows are to be removed because it is already part of the info
structure passed between index_bulk_delete() and index_vacuum_cleanup().

There are two options:
1) avoid the scan when there are no dead tuples, and ignore the
possibility that a VACUUM might be doing a follow-up scan to remove
previously deleted tuples.
2) close the loophole by recording whether a previous VACUUM had set any
pages as deleted, or not. If the index was "clean", then we would skip
the scan.

1) seems the best option since 2) is a lot of work for less gain. 

If all of that seems OK, I'll code a patch.

Best Regards, Simon Riggs






pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Remove xmin and cmin from frozen tuples
Next
From: "Sergey E. Koposov"
Date:
Subject: Re: 8.1beta, Subtle bug in COPY in Solaris systems