Home > mailing lists

Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
Date	November 13, 2015 00:38:20
Msg-id	CAM3SWZTWVC2Md8KFi461dju=NNjVxm8QB3YfkDCR5t84HHbqNg@mail.gmail.com Whole thread Raw
Responses	Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses Re: Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
List	pgsql-hackers

Tree view

We lack SortSupport for many character-like-type cases. In full, the
cases within the core system are:

* char(n) opfamily (bpchar_ops).

* text_pattern_ops opfamily (includes text and varchar "pattern"
opclasses, which are generally recommended for accelerating LIKE
operator queries).

* bpchar_pattern_ops -- the operator family/class for char(n), used
where "pattern" style indexing is required for the char(n) type.

* bytea default opclass. This is a type that, like the others, shares
its representation with text (a varlena header and some data bytes --
a string, essentially). Its comparator already behaves identically to
that of the text comparator when the "C" collation is used. I've
actually seen a specific request for this [1].

These cases do matter. For one thing, even if they're less important
than the default text/varchar opclasses, having such large
inconsistencies in character type sort performance is a fairly major
POLA violation; opclasses like text_pattern_ops are *supposed* to be
faster though less capable alternatives to the defaults. For another,
char(n) is in the SQL standard, which may be why all TPC benchmarks
use char(n) for columns that are sorted on or used for grouping.
char(n) sorting can be made about 8x faster with
SortSupport/abbreviation, making it the best candidate for
abbreviation optimization that I've ever seen. It would be regrettable
if we accidentally lost a benchmark due to not having char(n)
SortSupport.

Attached patch adds SortSupport for all of the cases listed above.

I did not bother doing anything with contrib/citext, because I think
it's not worth it. I think that we should definitely invest in case
insensitive collations, and retire contrib/citext. The fact that the
*default* collation (and not the input collation) is passed by
citextcmp()'s call to str_tolower() suggests to me that the only way
to make citext do the right thing is to basically introduce case
insensitive collations to Postgres. Of course, doing so would make
contrib/citext obsolete.

I also didn't bother extending the name type's SortSupport (something
that has existed since 9.2) to perform abbreviation, although that
wouldn't be very hard. I saw no point.

I think I might also get around to adding abbreviated key support for
the network address types during the 9.6 cycle. That seems like
something a less experienced contributor could easily pick up, though
-- if anyone wants to take it off my hands, please do so.

[1] https://lwn.net/Articles/653721/
--
Peter Geoghegan

Attachment

0001-Generalize-SortSupport-for-text.patch

pgsql-hackers by date:

From: Masahiko Sawada
Date: 13 November 2015, 00:07:50
Subject: Re: Support for N synchronous standby servers - take 2

From: Kouhei Kaigai
Date: 13 November 2015, 02:36:46
Subject: Re: Foreign join pushdown vs EvalPlanQual

Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses - Mailing list pgsql-hackers

Attachment

Previous

Next