Re: Single pass vacuum - take 2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Single pass vacuum - take 2
Date
Msg-id CA+TgmoZmVt8Em3Y9g3_2FkuWDFGcH_TXojsHXr=W-d7b5o7pwg@mail.gmail.com
Whole thread Raw
In response to Re: Single pass vacuum - take 2  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Sep 23, 2011 at 12:37 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'm still not really comfortable with the handling of vacuum
> generation numbers.

Pavan and I spent a bit of time today talking about how many vacuum
generation numbers we need to have in order for this scheme to work.
Before my memory fades, here are some notes:

- In an ideal world, we'd only need two vacuum generation numbers.
Call them 1 and 2.  We store the vacuum generation number of the last
successful vacuum somewhere.  When the next vacuum starts, any
dead-vacuumed line pointers stamped with the generation number of the
last successful vacuum get marked unused.  The new vacuum uses the
other generation number, stamping any new dead line pointers with that
generation and eventually, after the index vacuum is successfully
completed, storing that value as the last successful vacuum generation
number.  The next vacuum will repeat the whole cycle with the roles of
the two available vacuum generation numbers reversed.  If a vacuum
fails midway through, the last successful vacuum generation number
doesn't get updated; the next vacuum will reuse the same vacuum
generation number, which should be fine.

- However, making this work with HOT pruning is a bit stickier.  If
the last successful vacuum generation number is stored in pg_class and
heap_page_prune() looks at the relcache entry to get it, the value
there might be out of date.  If a HOT pruning operation sees that the
last successful vacuum generation was X, but in the meantime a new
vacuum has started that also uses generation number X, then the HOT
prune might mark a dead line pointer as unused while there are still
index entries pointing to it, which would be bad.  So here, there's
value to having a large number of vacuum generations rather than just
two.

- In particular, assuming we store the vacuum generation number in
pg_class, we'd like to have enough vacuum generation numbers that the
counter can't wrap around while there's still an old relcache entry
lying around.  2^31 seems like enough, because each vacuum consumes an
XID (but will that necessarily always be the case?) and if you've
eaten through 2^31 XIDs then any still-running transaction would be
suffering from wraparound problems anyway (but what if it's read-only
and keeps taking new snapshots without ever rebuilding the relcache
entry?  can that happen?).  However, if we store the vacuum generation
in the line pointer, we only have 30 bits available.

- There's also a problem with having just two vacuum generation
numbers if someone does a transactional update to pg_class, even if
they don't touch the hypothetical field that stores the generation
number:

rhaas=# begin;
BEGIN
rhaas=# update pg_class set relname=relname where oid='test'::regclass;
UPDATE 1

Then, in another session:
rhaas=# vacuum test;
VACUUM

VACUUM is perfectly happy to do a non-transactional update on the
then-current version of the pg_class tuple even while an open
transaction has a pending update to that tuple that might get
committed just afterward.  We can't risk getting confused about the
current vacuum generation number.  Well, OK, technically we can: if
there are an infinite number of vacuum generation numbers available,
then the worst thing that happens is we forget that a bunch of dead
line pointers are reclaimable, and do a bunch of extra work that isn't
really necessary.  But if there are just two, we're now going to get
confused about which line pointers can be safely reclaimed.

...

So, what do we do?  Possible solutions appear to include:

- Find some more bit space, so that we can make the vacuum generation
number wider.
- Store the vacuum generation number someplace other than a system
catalog, where the effects that can make us see a stale value or lose
an update don't exist.
- Don't let HOT pruning reclaim dead-vacuumed line pointers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade if 'postgres' database is dropped
Next
From: Scott Mead
Date:
Subject: Re: IDLE in transaction introspection