Re: automatic REINDEX-ing - Mailing list pgsql-general
From | Tom Lane |
---|---|
Subject | Re: automatic REINDEX-ing |
Date | |
Msg-id | 20358.1218676675@sss.pgh.pa.us Whole thread Raw |
In response to | Re: automatic REINDEX-ing (Kevin Hunter <hunteke@earlham.edu>) |
Responses |
Re: automatic REINDEX-ing
|
List | pgsql-general |
Kevin Hunter <hunteke@earlham.edu> writes: > Hmm. I get the reorganization bit, but so what? Since VACUUM FULL > already has an exclusive lock, what prevents it from updating the > indexes in-place to point to the new physical disk location? Why does > it need to create extra bloat? Crash safety. The basic process in vacuum full is: * detect that there is room in a lower-numbered page to move the physically last tuple to. * put a copy of the last tuple there. Mark the last tuple itself as deleted by the VACUUM. * make new index entries for that copy. * repeat until there's no longer any free space that can hold the last undeleted tuple. At this point there are two copies of every moved-down tuple, and two sets of index entries for it. The inserted copies are marked as inserted by the VACUUM FULL. Therefore, a crash anywhere along this part of the process will leave them having the same status as tuples inserted by any other failed transaction, ie, not good. The original copies are deleted by the vacuum, but again, since it crashed, the deletion is a no-op. Result: you still have one and only one good copy of each tuple. Next: commit the VACUUM FULL transaction and start a new one to continue working in. The commit is atomic and instantaneous via the same WAL magic that makes every other commit atomic and instantaneous. If we crash at some point after the commit, the state is that all the moved-down tuple copies are good, all the original ones are dead. Therefore we still have one and only one good copy of each row, and there's no data corruption. Next, vacuum full runs around and deletes the index entries for the original copies of the moved tuples. Crash partway through here doesn't matter, because it doesn't actually matter whether a dead tuple has all, some, or none of its expected index entries. It's not going to be returned by any indexscan anyway. The next vacuum will finish up the job of removing index entries for any dead tuples it finds. Once we know there are no index entries left for the dead tuples, we can physically remove the tuples. Once that's done, we can truncate away any wholly-free pages at the end of the table. So in short, at no point in a vacuum sequence can you lose data due to a crash. That wouldn't be true if we overwrote valid data or index entries. Plain vacuum just removes index entries for dead tuples and then the tuples themselves. It doesn't try to move anything across page boundaries. So while it frees space that can be re-used for subsequent insertions, it will very seldom be able to shrink the table length --- usually there will still be some live tuples near the end of the table. Vacuum full's problem with indexes is comparable: the structure of the index is largely dictated by the index access method, and there's no very good way to force pages at the physical end of the index to become empty. So the excess index entries that it temporarily adds tend to lead to long-term index bloat. The longer you wait between vacuum fulls, the more tuples have to be moved to compact the table, and the worse the index bloat gets. regards, tom lane
pgsql-general by date: