Thread: Re: pgsql: Allow tailoring of ICU locales with custom rules

Re: pgsql: Allow tailoring of ICU locales with custom rules

From
Peter Eisentraut
Date:
On 08.03.23 21:57, Jeff Davis wrote:
> On Wed, 2023-03-08 at 16:03 +0000, Peter Eisentraut wrote:
>> Allow tailoring of ICU locales with custom rules
> 
> Late review:
> 
> * Should throw error when provider != icu and rules != NULL

I have fixed that.

> * Explain what the example means. By itself, users might get confused
> wondering why someone would want to do that.
> 
> * Also consider a more practical example?

I have added a more practical example with explanation.

> * It appears rules IS NULL behaves differently from rules=''. Is that
> desired? For instance:
>    create collation c1(provider=icu,
>      locale='und-u-ka-shifted-ks-level1',
>      deterministic=false);
>    create collation c2(provider=icu,
>      locale='und-u-ka-shifted-ks-level1',
>      rules='',
>      deterministic=false);
>    select 'a b' collate c1 = 'ab' collate c1; -- true
>    select 'a b' collate c2 = 'ab' collate c2; -- false

I'm puzzled by this.  The general behavior is, extract the rules of the 
original locale, append the custom rules, use that.  If the custom rules 
are the empty string, that should match using the original rules 
untouched.  Needs further investigation.

> * Can you document the interaction between locale keywords
> ("@colStrength=primary") and a rule like '[strength 2]'?

I'll look into that.




Re: pgsql: Allow tailoring of ICU locales with custom rules

From
Amit Kapila
Date:
On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 08.03.23 21:57, Jeff Davis wrote:
>
> > * It appears rules IS NULL behaves differently from rules=''. Is that
> > desired? For instance:
> >    create collation c1(provider=icu,
> >      locale='und-u-ka-shifted-ks-level1',
> >      deterministic=false);
> >    create collation c2(provider=icu,
> >      locale='und-u-ka-shifted-ks-level1',
> >      rules='',
> >      deterministic=false);
> >    select 'a b' collate c1 = 'ab' collate c1; -- true
> >    select 'a b' collate c2 = 'ab' collate c2; -- false
>
> I'm puzzled by this.  The general behavior is, extract the rules of the
> original locale, append the custom rules, use that.  If the custom rules
> are the empty string, that should match using the original rules
> untouched.  Needs further investigation.
>
> > * Can you document the interaction between locale keywords
> > ("@colStrength=primary") and a rule like '[strength 2]'?
>
> I'll look into that.
>

This thread is listed on PostgreSQL 16 Open Items list. This is a
gentle reminder to see if there is a plan to move forward with respect
to open points.

--
With Regards,
Amit Kapila.



Re: pgsql: Allow tailoring of ICU locales with custom rules

From
Peter Eisentraut
Date:
On 24.07.23 04:46, Amit Kapila wrote:
> On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>>
>> On 08.03.23 21:57, Jeff Davis wrote:
>>
>>> * It appears rules IS NULL behaves differently from rules=''. Is that
>>> desired? For instance:
>>>     create collation c1(provider=icu,
>>>       locale='und-u-ka-shifted-ks-level1',
>>>       deterministic=false);
>>>     create collation c2(provider=icu,
>>>       locale='und-u-ka-shifted-ks-level1',
>>>       rules='',
>>>       deterministic=false);
>>>     select 'a b' collate c1 = 'ab' collate c1; -- true
>>>     select 'a b' collate c2 = 'ab' collate c2; -- false
>>
>> I'm puzzled by this.  The general behavior is, extract the rules of the
>> original locale, append the custom rules, use that.  If the custom rules
>> are the empty string, that should match using the original rules
>> untouched.  Needs further investigation.
>>
>>> * Can you document the interaction between locale keywords
>>> ("@colStrength=primary") and a rule like '[strength 2]'?
>>
>> I'll look into that.
> 
> This thread is listed on PostgreSQL 16 Open Items list. This is a
> gentle reminder to see if there is a plan to move forward with respect
> to open points.

I have investigated this.  My assessment is that how PostgreSQL 
interfaces with ICU is correct.  Whether what ICU does is correct might 
be debatable.  I have filed a bug with ICU about this: 
https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no 
response yet.

You can work around this by including the desired attributes in the 
rules string, for example

     create collation c3 (provider=icu,
       locale='und-u-ka-shifted-ks-level1',
       rules='[alternate shifted][strength 1]',
       deterministic=false);

So I don't think there is anything we need to do here for PostgreSQL 16.




Re: pgsql: Allow tailoring of ICU locales with custom rules

From
Jeff Davis
Date:
On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote:
> I have investigated this.  My assessment is that how PostgreSQL
> interfaces with ICU is correct.  Whether what ICU does is correct
> might
> be debatable.  I have filed a bug with ICU about this:
> https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no
> response yet.

Is everything other than the language and region simply discarded when
a rules string is present, or are some attributes preserved, or is
there some other nuance?

> You can work around this by including the desired attributes in the
> rules string, for example
>
>      create collation c3 (provider=icu,
>        locale='und-u-ka-shifted-ks-level1',
>        rules='[alternate shifted][strength 1]',
>        deterministic=false);
>
> So I don't think there is anything we need to do here for PostgreSQL
> 16.

Is there some way we can warn a user that some attributes will be
discarded, or improve the documentation? Letting the user figure this
out for themselves doesn't seem right.

Are you sure we want to allow rules for the database default collation
in 16, or should we start with just allowing them in CREATE COLLATION
and then expand to the database default collation later? I'm still a
bit concerned about users getting too fancy with daticurules, and
ending up not being able to connect to their database anymore.

Regards,
    Jeff Davis




Re: pgsql: Allow tailoring of ICU locales with custom rules

From
Amit Kapila
Date:
On Tue, Aug 22, 2023 at 10:55 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote:
> > I have investigated this.  My assessment is that how PostgreSQL
> > interfaces with ICU is correct.  Whether what ICU does is correct
> > might
> > be debatable.  I have filed a bug with ICU about this:
> > https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no
> > response yet.
>
> Is everything other than the language and region simply discarded when
> a rules string is present, or are some attributes preserved, or is
> there some other nuance?
>
> > You can work around this by including the desired attributes in the
> > rules string, for example
> >
> >      create collation c3 (provider=icu,
> >        locale='und-u-ka-shifted-ks-level1',
> >        rules='[alternate shifted][strength 1]',
> >        deterministic=false);
> >
> > So I don't think there is anything we need to do here for PostgreSQL
> > 16.
>
> Is there some way we can warn a user that some attributes will be
> discarded, or improve the documentation? Letting the user figure this
> out for themselves doesn't seem right.
>
> Are you sure we want to allow rules for the database default collation
> in 16, or should we start with just allowing them in CREATE COLLATION
> and then expand to the database default collation later? I'm still a
> bit concerned about users getting too fancy with daticurules, and
> ending up not being able to connect to their database anymore.
>

There is still an Open Item corresponding to this. Does anyone else
want to weigh in?

--
With Regards,
Amit Kapila.