On Tue, Apr 28, 2015 at 2:44 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> > I think what we need here is something that does heap_update to tuples
>> > at the end of the table, moving them to earlier pages; then wait for old
>> > snapshots to die (the infrastructure for which we have now, thanks to
>> > CREATE INDEX CONCURRENTLY); then truncate the empty pages. Of course,
>> > there are lots of details to resolve. It doesn't really matter that
>> > this runs for long: a process doing this for hours might be better than
>> > AccessExclusiveLock on the table for a much shorter period.
>>
>> Why do you need to do anything other than update the tuples and let
>> autovacuum clean up the mess?
>
> Sure, that's one option. I think autovac's current approach is too
> heavyweight: it always has to scan the whole relation and all the
> indexes. It might be more convenient to do something more
> fine-grained; for instance, maybe instead of scanning the whole
> relation, start from the end of the relation walking backwards and stop
> once the first page containing a live or recently-dead tuple is found.
> Perhaps, while scanning the indexes you know that all CTIDs with pages
> higher than some threshold value are gone; you can remove them without
> scanning the heap at all perhaps.
I agree that scanning all of the indexes is awfully heavy-weight, but
I don't see how we're going to get around that. The problem with
index vac is not that it's expensive to decide which CTIDs need to get
killed, but that we have to search for them in every page of the
index. Unfortunately, I have no idea how to get around that. The
only alternative approach is to regenerate the index tuples we expect
to find based on the heap tuples we're killing and search the index
for them one at a time. Tom's been opposed to that in the past, but
maybe it's worth reconsidering.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company