> > I do not however see how the current solution fixes the original problem,
> > that we don't have a rollback for index modifications.
> > The index would potentially point to an empty heaptuple slot.
>
> How? There will be an XLOG entry inserting the heap tuple before the
> XLOG entry that updates the index. Rollforward will redo both. The
> heap tuple might not get committed, but it'll be there.
Before commit or rollback the xlog is not flushed to disk, thus you can loose
those xlog entries, but the index page might already be on disk because of
LRU buffer reuse, no ?
Another example would be a btree reorg, like adding a level, that is partway
through before a crash.
> > Additionally I do not see how this all works for userland index types.
>
> None of it works for index types that don't do XLOG entries (which I
> think may currently be true for everything except btree :-( ...). I
> don't see how that changes if we alter the way this bit is done.
I really think that xlog entries should be done by a layer below the userland
functions. I would not like to risc WAL integrity by allowing userland to
write a messed up log record. The record would be something like:
called userland index insert for "key" and "ctid". With that info you can
easily redo, but undo would probably be hard. Thus the physical log.
Actually I am not sure index changes need to be (or are currently) logged at all.
You can deduce all necessary info from the heap xlog record
(plus maybe the original record from disk).
Andreas