Re: [HACKERS] ICU integration - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: [HACKERS] ICU integration
Date
Msg-id 5291804b-169e-3ba9-fdaf-fae8e7d2d959@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] ICU integration  (Andreas Karlsson <andreas@proxel.se>)
Responses Re: [HACKERS] ICU integration  (Andreas Karlsson <andreas@proxel.se>)
List pgsql-hackers
Updated patch attached.


On 3/14/17 22:15, Andreas Karlsson wrote:
> I do not like the schema based solution since search_path already gives 
> us enough headaches. As for the naming I am fine with the current scheme.

Yeah, it seems we're going to settle on the suffix idea.  See below for
an idea that builds on that.

> - I get a test failures in the default test suite due to not having the 
> tr_TR locale installed. I would assume that this would be pretty common 
> for hackers.

I have removed that test.  It seems it's not possible to test that
portably without major contortions.

> - The code no longer compiles without HAVE_LOCALE_T.

Fixed that.

> - I do not like how it is not obvious which is the default version of 
> every locale. E.g. I believe that "sv%icu" is "sv_reformed%icu" and not 
> "sv_standard%icu" as one might expect. Is this inherent to ICU or 
> something we can work around?

We get these keywords from ucol_getKeywordValuesForLocale(), which says
"Given a key and a locale, returns an array of string values in a
preferred order that would make a difference."  So all those are
supposedly different from each other.

> - ICU talks about a new syntax for locale extensions (old: 
> de_DE@collation=phonebook, new: de_DE_u_co_phonebk) at this page 
> http://userguide.icu-project.org/collation/api. Is this something we 
> need to car about? It looks like we currently use the old format, and 
> while I personally prefer it I am not sure we should rely on an old syntax.

Interesting.  I hadn't see this before, and the documentation is sparse.
 But it seems that this refers to BCP 47 language tags, which seem like
a good idea.

So what I have done is change all the predefined ICU collations to be
named after the BCP 47 scheme, with a "private use" -x-icu suffix
(instead of %icu).  The preserves the original idea but uses a standard
naming scheme.

I'm not terribly worried that they are going to remove the "old" locale
naming, but just to be forward looking, I have changed it so that the
collcollate entries are made using the "new" naming for ICU >=54.

> - I get an error when creating a new locale.
> 
> #CREATE COLLATION sv2 (LOCALE = 'sv');
> ERROR:  could not create locale "sv": Success
> 
> # CREATE COLLATION sv2 (LOCALE = 'sv');
> ERROR:  could not create locale "sv": Resource temporarily unavailable
> Time: 1.109 ms

Hmm, that's pretty straightforward code.  What is your operating system?
 What are the build options?  Does it work without this patch?

> - For the collprovider is it really correct to say that 'd' is the 
> default value as it does in catalogs.sgml?

It doesn't say it's the default value, it says it uses the database
default.  This is all a bit confusing.  We have a collation object named
"default", which uses the locale set for the database.  That's been that
way for a while.  Now when introducing the collation providers, that
"default" collation gets its own collprovider category 'd'.  That is not
really used anywhere, but we have to put something there.

> - I do not know what the policy for formatting the documentation is, but 
> some of the paragraphs are in need of re-wrapping.

Hmm, I don't see anything terribly bad.

> - Add a hint to "ERROR:  conflicting or redundant options". The error 
> message is pretty unclear.

I don't see that in my patch.  Example?

> - I am not a fan of this patch comparing the encoding with a -1 literal. 
> How about adding -1 as a value to the enum? See the example below for a 
> place where the patch compares with -1.

That's existing practice.  Not a great practice, probably, but widespread.

> - The patch adds "FIXME" in the below. Is that a left-over from 
> development or something which still needs doing?
> 
>      /*
>       * Also forbid matching an any-encoding entry.  This test of course 
> is not
>       * backed up by the unique index, but it's not a problem since we don't
> -    * support adding any-encoding entries after initdb.
> +    * support adding any-encoding entries after initdb. FIXME
>       */

I had mentioned that upthread.  It technically needs "doing" as you say,
but it's not clear how and it's not terribly important, arguably.

> - Should functions like normalize_locale_name() be renamed to indicate 
> they relate to libc locales? I am leaning towards doing so but have not 
> looked closely at the task.

Renamed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Emre Hasegeli
Date:
Subject: Re: [HACKERS] Parallel Bitmap scans a bit broken
Next
From: "Mengxing Liu"
Date:
Subject: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling fromrw-conflict tracking in serializable transactions