Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> Here's a patch implementing this scheme.
I've read through the papers on the btree technique, read the
thread, and have been reviewing the patch. It will take me another
day or two to complete a close review and testing, but I wanted to
give preliminary results, and just let people know that I am
working on it.
After reading through the papers and the README for btree, I have
to say that the approach to deletion seemed like the weak link.
Searches, scans, and insertions all were pretty cohesive. The
logic was all based on the fact that missing *upper* levels was OK
because the sibling pointers would be automatically used to search
until the correct page was found at each level. The old deletion
approach turned this on its head by having missing pages at the
bottom, which introduces whole new classes of problems which
otherwise don't exist. This patch makes interim states in the
deletion process look almost exactly the same as interim states in
the insertion process, which I think is A Very Good Thing.
The approach to limiting the number of locks to a finite (and
small) number is clever, but seems quite sound. The locks are
always taken in an order which cannot cause a deadlock.
The write-up lacks any explicit mention of what happens if the
branch being considered for deletion reaches all the way to the
root. Although an astute reader will notice that since root page
will always be the only page at its level, it must always be the
rightmost page at its level, and therefore is correctly handled by
the logic dealing with that, I think it merits an explicit mention
in the README.
I agree with others that this patch is not suitable for
back-patching. I wonder whether some other solution might be worth
back-patching, since it is clearly a bug. The "never delete
internal pages" might be safe enough to consider. It would lead to
some degree of bloat, but perhaps bloat is better than errors. Of
course, the argument can certainly be made that people hit the bug
so rarely that we're better off just leaving it alone in stable
branches. Anyway, that would be a different patch.
More on *this* patch in a day or two.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company