Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: B-Tree support function number 3 (strxfrm() optimization)
Date
Msg-id CA+TgmobUgp7J0m=dRj92Y+eZKfzvvzNaFtBQcd9nP3Atzh0+iw@mail.gmail.com
Whole thread Raw
In response to Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Responses Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Sun, Nov 9, 2014 at 10:02 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Sat, Oct 11, 2014 at 6:34 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Attached patch, when applied, accelerates all tuplesort cases using
>> abbreviated keys, building on previous work here, as well as the patch
>> posted to that other thread.
>
> I attach an updated patch set, rebased on top of the master branch's
> tip. All relevant tuplesort cases (B-Tree, MinimalTuple, CLUSTER) are
> now directly covered by the patch set, since there is now general
> sortsupport support for those cases in the master branch -- no need to
> apply some other patch from some other thread.
>
> For the convenience of reviewers, this new revision includes a new
> approach to making my improvements cumulative: A second commit adds
> tuple count estimation. This hint, passed along to the text opclass's
> convert routine, is taken from the optimizer's own estimate, or the
> relcache's reltuples, depending on the tuplesort case being
> accelerated. As in previous revisions, the idea is to give the opclass
> a sense of proportion about how far along it is, to be weighed in
> deciding whether or not to abort abbreviation. One potentially
> controversial aspect of that is how the text opclass abbreviation cost
> model/abort early stuff weighs simply having many tuples - past a
> certain point, it *always* proceeds with abbreviation, not matter what
> the cardinality of abbreviated keys so far is. For that reason it
> particular, it seemed to make sense to split these parts out into a
> second commit.
>
> I hope that we can finish up all 9.5 work on accelerated sorting soon.

There's a lot of stuff in this patch I'm still trying to digest, but
here are a few thoughts on patch 0001:

- This appears to needlessly reindent the comments for PG_CACHE_LINE_SIZE.

- I really don't think we need a #define in pg_config_manual.h for
this.  Please omit that.

- I'm much happier with the way the changes to sortsupport.h look in
this version.  However, I think that auth_comparator is a confusing
name, because "auth" is often used as an abbreviation for
"authentication".   We can spell it out (authoritative_comparator) or
come up with a different name (backup_comparator?
abbrev_full_comparator?).  Whatever we do, ApplySortComparatorAuth()
should be renamed to match.

- Also, I don't think making abbrev_state an enumerated value with two
values is really doing anything for us; we could just use a Boolean.
I'm wondering if we should actually go a bit further and remove this
from the SortSupport object and instead add an additional Boolean flag
to PrepareSortSupportFrom(OrderingOp|IndexRel) that gets passed all
the way down to the opclass's sortsupport function.  It seems like
that might be more clear.  Once the opclass function has done its
thing, the other three new nembers are enough to know whether we're
using the optimization or not (and can be fiddled if we want to make a
later decision to call the whole thing off).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alex Shulgin
Date:
Subject: Re: Missing OOM checks in libpq (was Re: Replication connection URI?)
Next
From: Fujii Masao
Date:
Subject: Re: The problems of PQhost()