Tom, I assume you have dealt with this, right?
> I have been puzzling out the coding in _bt_binsrch() in
> backend/access/nbtree/nbtsearch.c, with an eye to speeding it up for
> the many-equal-keys case. I have finally worked out exactly what it's
> doing, to wit:
>
> * On a leaf page, we always return the first key >= scan key
> * (which could be the last slot + 1).
> *
> * On a non-leaf page, there are special cases:
> *
> * For an insertion (srchtype != BT_DESCENT and natts == keysz)
> * always return first key >= scan key (which could be off the end).
> *
> * For a standard search (srchtype == BT_DESCENT and natts == keysz)
> * return the first equal key if one exists, else the last lesser key
> * if one exists, else the first slot on the page.
> *
> * For a partial-match search (srchtype == BT_DESCENT and natts < keysz)
> * return the last lesser key if one exists, else the first slot.
>
> This strikes me as a tad bizarre --- in particular, the discrepancy
> between treatment of equal keys in the normal and partial search cases.
>
> I think I understand why the partial-match code works that way: there
> could be matching keys in the sub-page belonging to the last lesser key.
> For example, if our scan target is (x = 2) and we have internal keys
> (x = 1, y = 2)
> (x = 2, y = 1)
> then we need to look at the first key's subpages, where we might find
> matching keys like (x = 2, y = 0).
>
> The full-width case appears to assume that that can't happen: if we
> have a given key in an upper page, there can be *no* equal keys in
> subpages to its left. That's a rather strong assumption about how
> page splitting is done; is it correct?
>
> Even more to the point, *should* it be correct? If we always returned
> the last lesser key, then the code would work for any correctly
> sequenced b-tree, but this code looks like it works only if splits occur
> only at the leftmost of a group of equal keys. If there are a lot of
> equal keys, that could result in a badly unbalanced tree, no? Maybe
> that's the real reason why performance seems to be so bad for many
> equal keys... maybe the split algorithm needs to be relaxed?
>
> regards, tom lane
>
>
-- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610)
853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill,
Pennsylvania19026