Re: automatic REINDEX-ing - Mailing list pgsql-general

From Tom Lane
Subject Re: automatic REINDEX-ing
Date
Msg-id 20358.1218676675@sss.pgh.pa.us
Whole thread Raw
In response to Re: automatic REINDEX-ing  (Kevin Hunter <hunteke@earlham.edu>)
Responses Re: automatic REINDEX-ing  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-general
Kevin Hunter <hunteke@earlham.edu> writes:
> Hmm.  I get the reorganization bit, but so what?  Since VACUUM FULL
> already has an exclusive lock, what prevents it from updating the
> indexes in-place to point to the new physical disk location?  Why does
> it need to create extra bloat?

Crash safety.  The basic process in vacuum full is:

* detect that there is room in a lower-numbered page to move the
physically last tuple to.

* put a copy of the last tuple there.  Mark the last tuple itself
as deleted by the VACUUM.

* make new index entries for that copy.

* repeat until there's no longer any free space that can hold the
last undeleted tuple.

At this point there are two copies of every moved-down tuple, and
two sets of index entries for it.  The inserted copies are marked
as inserted by the VACUUM FULL.  Therefore, a crash anywhere along
this part of the process will leave them having the same status
as tuples inserted by any other failed transaction, ie, not good.
The original copies are deleted by the vacuum, but again, since
it crashed, the deletion is a no-op.  Result: you still have one
and only one good copy of each tuple.

Next: commit the VACUUM FULL transaction and start a new one
to continue working in.  The commit is atomic and instantaneous
via the same WAL magic that makes every other commit atomic
and instantaneous.

If we crash at some point after the commit, the state is that
all the moved-down tuple copies are good, all the original ones
are dead.  Therefore we still have one and only one good copy
of each row, and there's no data corruption.

Next, vacuum full runs around and deletes the index entries for
the original copies of the moved tuples.  Crash partway through
here doesn't matter, because it doesn't actually matter whether
a dead tuple has all, some, or none of its expected index entries.
It's not going to be returned by any indexscan anyway.  The next
vacuum will finish up the job of removing index entries for any
dead tuples it finds.

Once we know there are no index entries left for the dead tuples,
we can physically remove the tuples.  Once that's done, we
can truncate away any wholly-free pages at the end of the table.

So in short, at no point in a vacuum sequence can you lose data
due to a crash.  That wouldn't be true if we overwrote valid
data or index entries.

Plain vacuum just removes index entries for dead tuples and then the
tuples themselves.  It doesn't try to move anything across page
boundaries.  So while it frees space that can be re-used for subsequent
insertions, it will very seldom be able to shrink the table length ---
usually there will still be some live tuples near the end of the table.

Vacuum full's problem with indexes is comparable: the structure of the
index is largely dictated by the index access method, and there's no
very good way to force pages at the physical end of the index to
become empty.  So the excess index entries that it temporarily adds
tend to lead to long-term index bloat.  The longer you wait between
vacuum fulls, the more tuples have to be moved to compact the table,
and the worse the index bloat gets.

            regards, tom lane

pgsql-general by date:

Previous
From: ries van Twisk
Date:
Subject: Re: \copy, transactions and permissions
Next
From: Tom Lane
Date:
Subject: Re: Column alias in where clause?