Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Tom, I assume this is done, right?
Only partly. I changed _bt_binsrch to use a pure binary search,
but didn't yet get around to fixing _bt_findsplitloc (it's more
complicated :-().
Vadim seemed to think that a better solution would be to fix the
comparison rules so that no two entries are ever considered to have
equal keys anyway. So I put my change on the back burner ... but if
his change doesn't happen, mine ought to get done.
>> I wrote:
>>>> then a possible explanation for the problem would be that our
>>>> btree index code isn't very fast when there are large numbers of
>>>> identical keys.
>>
>> Ah-hah, a lucky guess! I went back and looked at the profile stats
>> I'd extracted from Edmund's "update" example. This Linux box has
>> the same semi-functional gprof as someone else was using a while
>> ago --- the timings are bogus, but the call counts seem right.
>> And what I find are entries like this:
>>
>> 0.00 0.00 284977/284977 _bt_binsrch [3174]
>> [3177] 0.0 0.00 0.00 284977 _bt_firsteq [3177]
>> 0.00 0.00 21784948/24713758 _bt_compare [3169]
>>
>> 0.00 0.00 426/35632 _bt_split [53]
>> 0.00 0.00 35206/35632 _bt_insertonpg [45]
>> [3185] 0.0 0.00 0.00 35632 _bt_findsplitloc [3185]
>> 0.00 0.00 5093972/8907411 _bt_itemcmp [3171]
>>
>> In other words, _bt_firsteq is averaging almost 100 comparisons per
>> call, _bt_findsplitloc well over that. Both of these routines are
>> evidently designed on the assumption that there will be relatively
>> few duplicate keys --- they reduce to linear scans when there are
>> many duplicates.
>>
>> _bt_firsteq shouldn't exist at all; the only routine that calls it
>> is _bt_binsrch, which does a fast binary search of the index page.
>> _bt_binsrch should be fixed so that the binary search logic does the
>> right thing for equal-key cases, rather than degrading to a linear
>> search. I am less confident that I understand _bt_findsplitloc's place
>> in the great scheme of things, but it could certainly be revised to use
>> a binary search rather than linear scan.
>>
>> This benchmark is clearly overemphasizing the equal-key case, but
>> I think it ought to be fixed independently of whether we want to
>> look good on a dubious benchmark ... equal keys are not uncommon in
>> real-world scenarios after all.
>>
>> Next question is do we want to risk twiddling this code so soon before
>> 6.5, or wait till after?