Home > mailing lists

Re: Question regarding UTF-8 data and "C" collation on definition of field of table - Mailing list pgsql-general

From	Peter Geoghegan
Subject	Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Date	February 6, 2023 01:14:44
Msg-id	CAH2-WzkbWXvDGq2=ytGckKgkj22a7kuweaphY6iOE55_aVjwsw@mail.gmail.com Whole thread
In response to	Re: Question regarding UTF-8 data and "C" collation on definition of field of table (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Question regarding UTF-8 data and "C" collation on definition of field of table
List	pgsql-general

Tree view

On Sun, Feb 5, 2023 at 4:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If there's a predominant language in the data, selecting a collation
> matching that seems like your best bet.  Otherwise, maybe you should
> just shrug your shoulders and stick with C collation.  It's likely
> to be faster than any alternative.

FWIW there are certain "compromise locales" supported by ICU/CLDR.
These include "English (Europe)", and, most notably, EOR (European
Ordering Rules):

https://en.wikipedia.org/wiki/European_ordering_rules

I'm not sure how widely used those are. EOR seems to have been
standardized by the EU or by an adjacent institution, so not sure how
widely used it really is.

It's also possible to use a custom collation with ICU, which is almost
infinitely flexible:

http://www.unicode.org/reports/tr10/#Customization

As an example, the rules about the relative ordering of each script
can be changed this way. There is also something called merged
tailorings.

The OP should see the Postgres ICU docs for hints on how to use these
facilities to make a custom collation that matches whatever their
requirements are:

https://www.postgresql.org/docs/current/collation.html#COLLATION-MANAGING

-- 
Peter Geoghegan

pgsql-general by date:

From: Dionisis Kontominas
Date: 06 February 2023, 01:07:01
Subject: Re: Question regarding UTF-8 data and "C" collation on definition of field of table

From: Tom Lane
Date: 06 February 2023, 01:30:25
Subject: Re: Question regarding UTF-8 data and "C" collation on definition of field of table

Re: Question regarding UTF-8 data and "C" collation on definition of field of table - Mailing list pgsql-general

Previous

Next