Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
Date
Msg-id CAM3SWZTWVC2Md8KFi461dju=NNjVxm8QB3YfkDCR5t84HHbqNg@mail.gmail.com
Whole thread Raw
Responses Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
List pgsql-hackers
We lack SortSupport for many character-like-type cases. In full, the
cases within the core system are:

* char(n) opfamily (bpchar_ops).

* text_pattern_ops opfamily (includes text and varchar "pattern"
opclasses, which are generally recommended for accelerating LIKE
operator queries).

* bpchar_pattern_ops -- the operator family/class for char(n), used
where "pattern" style indexing is required for the char(n) type.

* bytea default opclass. This is a type that, like the others, shares
its representation with text (a varlena header and some data bytes --
a string, essentially). Its comparator already behaves identically to
that of the text comparator when the "C" collation is used. I've
actually seen a specific request for this [1].

These cases do matter. For one thing, even if they're less important
than the default text/varchar opclasses, having such large
inconsistencies in character type sort performance is a fairly major
POLA violation; opclasses like text_pattern_ops are *supposed* to be
faster though less capable alternatives to the defaults. For another,
char(n) is in the SQL standard, which may be why all TPC benchmarks
use char(n) for columns that are sorted on or used for grouping.
char(n) sorting can be made about 8x faster with
SortSupport/abbreviation, making it the best candidate for
abbreviation optimization that I've ever seen. It would be regrettable
if we accidentally lost a benchmark due to not having char(n)
SortSupport.

Attached patch adds SortSupport for all of the cases listed above.

I did not bother doing anything with contrib/citext, because I think
it's not worth it. I think that we should definitely invest in case
insensitive collations, and retire contrib/citext. The fact that the
*default* collation (and not the input collation) is passed by
citextcmp()'s call to str_tolower() suggests to me that the only way
to make citext do the right thing is to basically introduce case
insensitive collations to Postgres. Of course, doing so would make
contrib/citext obsolete.

I also didn't bother extending the name type's SortSupport (something
that has existed since 9.2) to perform abbreviation, although that
wouldn't be very hard. I saw no point.

I think I might also get around to adding abbreviated key support for
the network address types during the 9.6 cycle. That seems like
something a less experienced contributor could easily pick up, though
-- if anyone wants to take it off my hands, please do so.

[1] https://lwn.net/Articles/653721/
--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Support for N synchronous standby servers - take 2
Next
From: Kouhei Kaigai
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual