Thread: RE: btree split logic is fragile in the presence of lar ge index items
> > While implementing multi-key btree-s for 6.1 I found problems with > > duplicates handling and this is why extra logic was added. > > But I never was happy with this logic -:) > > Do you not like the proposal I was suggesting? I thought it > was pretty much what you said yourself a few months ago... Do not add TID to key but store key anywhere in duplicate chain and just read lefter child page while positioning index scan, as we do right now for partial keys? This will result in additional reads but I like it much more than current "logic"... especially keeping in mind that we'll have to implement redo/undo for btree very soon and this would be nice to simplify things -:) But if you're going to change btree then please do it asap - I hope to begin btree redo/undo implementation in 2-3 days, just after heap... Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> Do you not like the proposal I was suggesting? I thought it >> was pretty much what you said yourself a few months ago... > Do not add TID to key but store key anywhere in duplicate chain and > just read lefter child page while positioning index scan, as we do > right now for partial keys? > This will result in additional reads but I like it much more than > current "logic"... Offhand it seems good to me too. For the normal case where there are many keys per page and not so many duplicates, an unneeded read should be rare anyway. I will need to study Lehman & Yao a little more to ensure they don't have a problem with it, but if not I'll do it that way. (I was surprised to realize that Lehman is the same Phil Lehman I knew in grad school ... in fact he was probably working on this paper when I knew him. Small world ain't it.) > But if you're going to change btree then > please do it asap - I hope to begin btree redo/undo implementation > in 2-3 days, just after heap... Slavedriver ;-) ... I'll see what I can do ... regards, tom lane
> -----Original Message----- > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On > Behalf Of Tom Lane > > "Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > >> Do you not like the proposal I was suggesting? I thought it > >> was pretty much what you said yourself a few months ago... > > > Do not add TID to key but store key anywhere in duplicate chain and > > just read lefter child page while positioning index scan, as we do > > right now for partial keys? > > > This will result in additional reads but I like it much more than > > current "logic"... > What about unique key insertions ? Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >>>> Do not add TID to key but store key anywhere in duplicate chain and >>>> just read lefter child page while positioning index scan, as we do >>>> right now for partial keys? >> >>>> This will result in additional reads but I like it much more than >>>> current "logic"... > What about unique key insertions ? What about 'em? Doesn't seem to make any difference as far as I can see. You still need to be prepared to scan right to see all of the potential matches. I have been digging in the index code and am thinking of reinstating another aspect of the older implementation. Once upon a time, the code was set up to treat the leftmost key on any non-leaf tree level as minus-infinity. That seems to me to agree with the data structure Lehman and Yao have in mind: in their drawings, each down-link pointer on a non-leaf level is "between" two keys, except that the leftmost downlink has no key to its left. Their drawings all show n+1 downlinks in an n-key non-leaf node. We didn't match that representation, so we need to fake it with a dummy key associated with the first downlink. The original code did that, but it got taken out at some point and replaced with pretty ugly code to propagate minimum-key changes back up the tree when the leftmost child node has to be split. But I think the only thing wrong with it was that not all the comparison routines knew they needed to do that. btree seems to be suffering from an abundance of comparison routines ... I'm going to try to eliminate some of the redundancy. Actually, we could shave some storage by not storing a key in the first data item of any non-leaf page, whether leftmost on its level or not. That would correspond exactly to L&Y's drawings. regards, tom lane