Re: Single pass vacuum - take 2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Single pass vacuum - take 2
Date
Msg-id CA+TgmobKRb5jwFOTZYYiDsvkY9dtaZPMqr_BzJ5JWHeBrTwHkQ@mail.gmail.com
Whole thread Raw
In response to Re: Single pass vacuum - take 2  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Single pass vacuum - take 2
Re: Single pass vacuum - take 2
List pgsql-hackers
On Tue, Aug 30, 2011 at 6:38 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> Yeah. If we don't know the status of the vacuum that collected the
> line pointer and marked it vacuum-dead, the next vacuum will pick it
> up again and stamp it with its own generation number.

I'm still not really comfortable with the handling of vacuum
generation numbers.  If we're going to say that 2^30 is large enough
that we don't need to worry about the counter wrapping around, then we
need some justification for that position.  Why can't we have 2^30
consecutive failed vacuums on a single table?  Sure, it would take a
long time, but we guard against many failure conditions that would
take a long time, and the result is that we have fewer corner-case
failures.  I want an explanation of why it's *safe*, and what the
smallest number of vacuum generations that we must support to make it
safe is.  If we blow the handling of this, we are going to eat the
user's data, so we had better have a really convincing argument as to
why what we're doing is OK.

Here's a possible alternative implementation: we allow up to 32 vacuum
generations to exist at once.  We keep a 64 bit integer indicating the
state of each vacuum generation: 00 = no line pointers with this
vacuum generation exist in the heap, 01 = some line pointers with this
vacuum generation may exist in the heap, but they are not removable,
11 = some line pointers with this vacuum generation exist in the heap,
and they are removable.  Then, when we start a VACUUM, we look for a
vacuum generation with flags 01.  If we find one, we adopt that as the
generation number for this vacuum.  If not, we look for one with flags
00, and if we find one, we set its flags to 01 and adopt it as the
generation number for this vacuum.  (If this too fails, then all
vacuums are in state 11.  There are several ways that could be handled
- either we make a pass over the heap just to free dead line pointers,
or we randomly select a vacuum generation number and push it back to
state 01, or we make all line pointers encountered during the vacuum
merely dead rather than dead-vacuumed; I think I like that option
best.)  When we complete the heap scan, we set the flags of any vacuum
generation numbers that were previously 11 back to 00 (assuming we've
visited all the not-all-visible pages).  When we complete the index
pass, we set the flags of our chosen vacuum generation number to 11.

There is clearly room for argument about the details here; for
example, as the algorithm is presented, it's hard to see how you would
end up with more than one vacuum generation number in each state, so
maybe you only need three values, not 32.  I suppose it could be
useful to have more values if you want to sometimes vacuum only part
of the heap, because then you'd only get to mark vacuum generation
numbers as unused on those occasions when you actually did scan the
whole heap.  But regardless of that detail, the thing I like about
what I'm proposing here is that it provides a closed loop around the
management of vacuum generation numbers - we always know the exact
state of each vacuum generation number, as opposed to just hoping that
by the billionth vacuum there won't be any leftovers.  Of course, it
may be also that we can convince ourselves that your algorithm as
implemented is safe ... but I'm not convinced, yet.

Another thing I'm not sure whether to worry about is the question of
where we store the vacuum generation information.  I mean, if we store
it in pg_class, then what happens if the user does a manual update of
pg_class just as we're updating the vacuum generation information?  We
had better make sure that there are no cases where we can accidentally
think that it's OK to reclaim dead line pointers that really still
have references, or we're going to end up with some awfully
difficult-to-find bugs...  never mind the fact the possibility of the
user manually updating the value and hosing themselves.  Of course, we
already have some of those issues - relfrozenxid probably has the same
problems - and I'm not 100% sure whether this one is any worse.  It
would be really nice to have those non-transactional tables that
Alvaro keeps mumbling about, though, or some other way to store this
information.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: unite recovery.conf and postgresql.conf
Next
From: Josh Berkus
Date:
Subject: Re: citext operator precedence fix