>>> I don't understand why WAL needs to log internal operations of any of
>>> the index types. Seems to me that you could treat indexes as black
>>> boxes that are updated as side effects of WAL log items for heap tuples:
>>> when adding a heap tuple as a result of a WAL item, you just call the
>>> usual index insert routines, and when deleting a heap tuple as a result
>>
>> On recovery backend *can't* use any usual routines:
>> system catalogs are not available.
>
>OK, good point, but that just means you can't use the catalogs to
>discover what indexes exist for a given table. You could still create
>log entries that look like "insert indextuple X into index Y" without
>any further detail.
And how could I use such records on recovery
being unable to know what data columns represent
keys, what functions should be used for ordering?
>>> the index is corrupt and rebuild it from scratch, using Hiroshi's
>>> index-rebuild code.
>>
>> How fast is rebuilding of index for table with 10^7 records?
>
>It's not fast, of course. But the point is that you should seldom
>have to do it.
With WAL system writes lazy and as result
probability to see "begin update" confirmation
without "done update" will be high, very high
(only log records go to disk on commit, data blocks
will be forced to disk on checkpoints - each 3-5
minutes - only).
>> I agree to consider rtree/hash/gist as experimental
>> index access methods BUT we have to have at least
>> *one* reliable index AM with short down time/
>> fast recovery.
>
>With all due respect, I wonder just how "reliable" btree WAL undo/redo
>will prove to be ... let alone the other index types. I worry that
>this approach is putting too much emphasis on making it fast, and not
>enough on making it right.
This approach (logging all index changes) is *standard*
WAL approach and is reliable (but implementation may be
not of course -:)). This is what I've seen in books,
I didn't invent anything new and special here.
Tom, can you implement (or spend a some time for design)
hash redo/undo with "black box approach" so we could
see how good is it? I still miss are you going to use
begin/done update or "insert tuple X into index Y"
records.
Vadim