Re: Thoughts on maintaining 7.3 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Thoughts on maintaining 7.3
Date
Msg-id 27612.1065326029@sss.pgh.pa.us
Whole thread Raw
In response to Re: Thoughts on maintaining 7.3  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Responses Re: Thoughts on maintaining 7.3
List pgsql-hackers
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Sat, Oct 04, 2003 at 11:41:17AM -0400, Tom Lane wrote:
>> No.  You'd be better off using REINDEX for that, I think.  IIRC we have
>> speculated about making VAC FULL fix the indexes via REINDEX rather than
>> indexbulkdelete.

> I can't agree with that idea.

Why not?  There is plenty of anecdotal evidence in the archives saying
that it's faster to drop indexes, VACUUM FULL, recreate indexes than
to VACUUM FULL with indexes in place.  Most of those reports date from
before we had the lazy-vacuum alternative, but I don't think that
renders them less relevant.

> Imagine having to VACUUM FULL a huge
> table.  Not only it will take the lot required to do the VACUUM in the
> heap itself, it will also have to rebuild all indexes from scratch.

A very large chunk of VACUUM FULL's runtime is spent fooling with the
indexes.  Have you looked at the code in any detail?  It goes like this:

1. Scan heap looking for dead tuples and free space.

2. Make a pass over the indexes to delete index entries for dead tuples.

3. Copy remaining live tuples to lower-numbered pages to compact heap.
3a.  Every time we copy a tuple, make new index entries pointing to its    new location.  (The old index entries still
remain,though.)
 

4. Commit transaction so that new copies of moved tuples are good and  old ones are not.

5. Make a pass over the indexes to delete index entries for old copies  of moved tuples.

When there are only a few tuples being moved, this isn't too bad of a
strategy.  But when there are lots, steps 2, 3a, and 5 represent a huge
amount of work.  What's worse, step 3a swells the index well beyond its
final size.  This used to mean permanent index bloat.  Nowadays step 5
will be able to recover some of that space --- but not at zero cost.

I think it's entirely plausible that dropping steps 2, 3a, and 5 in
favor of an index rebuild at the end could be a winner.

> I think there are scenarios where the REINDEX will be much worse, say when
> there are not too many deleted tuples (but in that case, why is the user
> doing VACUUM FULL in the first place?).

Yeah, I think that's exactly the important point.  These days there's
not a lot of reason to do VACUUM FULL unless you have a major amount of
restructuring to do.  I would once have favored maintaining two code
paths with two strategies, but now I doubt it's worth the trouble.
(Or I should say, we have two code paths, the other being lazy VACUUM
--- do we need three?)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Thoughts on maintaining 7.3
Next
From: Tom Lane
Date:
Subject: Re: Thoughts on maintaining 7.3