Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: B-Tree support function number 3 (strxfrm() optimization)
Date
Msg-id CA+Tgmoa9Fqc72JwGnKv8smtYEOy2VTU2odNjWMdC48b_UoYPyg@mail.gmail.com
Whole thread Raw
In response to Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Responses Re: B-Tree support function number 3 (strxfrm() optimization)
List pgsql-hackers
On Thu, Sep 4, 2014 at 2:12 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Sep 4, 2014 at 9:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> * Still doesn't address the open question of whether or not we should
>>> optimistically always try "memcmp() == 0" on tiebreak. I still lean
>>> towards "yes".
>>
>> Let m be the cost of a memcmp() that fails near the end of the
>> strings; and let s be the cost of a strcoll that does likewise.
>> Clearly s > m.  But approximately what is s/m on platforms where you
>> can test?  Say, with 100 byte string, in a few different locales.
>
> Just to be clear: I imagine you're more or less sold on the idea of
> testing equality in the event of a tie-break, where the leading 8
> primary weight bytes are already known to be equal (and the full text
> string lengths also match); the theory of operation behind testing how
> good a proxy for full key cardinality abbreviated key cardinality is
> is very much predicated on that. We can still win big with very low
> cardinality sets this way, which are an important case. What I
> consider an open question is whether or not we should do that on the
> first call when there is no abbreviated comparison, such as on the
> second or subsequent attribute in a multi-column sort, in the hope
> that equality will just happen to be indicated.

Eh, maybe?  I'm not sure why the case where we're using abbreviated
keys should be different than the case we're not.  In either case this
is a straightforward trade-off: if we do a memcmp() before strcoll(),
we win if it returns 0 and lose if returns non-zero and strcoll also
returns non-zero.  (If memcmp() returns non-zero but strcoll() returns
0, it's a tie.)  I'm not immediately sure why it should affect the
calculus one way or the other whether abbreviated keys are in use; the
question of how much faster memcmp() is than strcoll() seems like the
relevant consideration either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Commitfest status
Next
From: Noah Yetter
Date:
Subject: Re: Pg_upgrade and toast tables bug discovered