Home > mailing lists

Re: Add standard collation UNICODE - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Add standard collation UNICODE
Date	March 8, 2023 18:25:42
Msg-id	77f0df84a8e146bd1afa55bcaf26dcd6cc3faebd.camel@j-davis.com Whole thread Raw
In response to	Re: Add standard collation UNICODE (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses	Re: Add standard collation UNICODE
List	pgsql-hackers

Tree view

On Wed, 2023-03-08 at 07:21 +0100, Peter Eisentraut wrote:
> On 04.03.23 19:29, Jeff Davis wrote:
> > It looks like the way you've handled this is by inserting the
> > collation
> > with collprovider=icu even if built without ICU support. I think
> > that's
> > a new case, so we need to make sure it throws reasonable user-
> > facing
> > errors.
>
> It would look like this:
>
> => select * from t1 order by b collate unicode;
> ERROR:  0A000: ICU is not supported in this build

Right, the error looks good. I'm just pointing out that before this
patch, having provider='i' in a build without ICU was a configuration
mistake; whereas afterward every database will have a collation with
provider='i' whether it has ICU support or not. I think that's fine,
I'm just double-checking.

Why is "unicode" only provided for the UTF-8 encoding? For "ucs_basic"
that makes some sense, because the implementation only works in UTF-8.
But here we are using ICU, and the "und" locale should work for any
ICU-supported encoding. I suggest that we use collencoding=-1 for
"unicode", and the docs can just add a note next to "ucs_basic" that it
only works for UTF-8, because that's the weird case.

For the docs, I suggest that you clarify that "ucs_basic" has the same
behavior as the C locale does *in the UTF-8 encoding*. Not all users
might pick up on the subtlety that the C locale has different behaviors
in different encodings.

Other than that, it looks good.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

pgsql-hackers by date:

From: Antonin Houska
Date: 08 March 2023, 18:07:57
Subject: Re: Parallelize correlated subqueries that execute within each worker

From: Melanie Plageman
Date: 08 March 2023, 18:44:32
Subject: Re: Add shared buffer hits to pg_stat_io

Re: Add standard collation UNICODE - Mailing list pgsql-hackers

Previous

Next