"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> Basically what I want here is to build the new index relation as
>> a new file (set of files, if large) and then atomically commit it
>> as the new version of the index.
> Hmm,your plan seems to need WAL.
> We must postpone to build indexes until the end of tuple moving
> in vacuum. Once tuple moving started,the consistency between
> heap and indexes would be broken. Currently(without WAL) this
> inconsistency could never be recovered in case of rollback.
Why? The same commit that makes the new index valid would make the
tuple movements valid. Actually, the way VACUUM currently works,
the tuple movements have been committed before we start freeing index
entries anyway. (One reason VACUUM is so inefficient with indexes
is that there is a peak index usage where there are index entries for
*both* old and new tuple positions. I don't feel a need to change
that, as long as the duplicate entries are in the old index that
we're hoping to get rid of.)
>> this transaction would atomically replace the index. (Vacuuming
>> pg_index's own indexes this way might be a tad tricky though...)
> ??? Don't pg_class and pg_attribute needs tricky handling either ?
> Seems pg_class alone needs to be tricky when we use rel versioning.
Could be. I think working through how we handle system tables and
indexes is the key stumbling block we've got to get past to have
versioning. I don't know quite how to do it, yet.
regards, tom lane