Thread: Re: pgsql: Allow tailoring of ICU locales with custom rules
On 08.03.23 21:57, Jeff Davis wrote: > On Wed, 2023-03-08 at 16:03 +0000, Peter Eisentraut wrote: >> Allow tailoring of ICU locales with custom rules > > Late review: > > * Should throw error when provider != icu and rules != NULL I have fixed that. > * Explain what the example means. By itself, users might get confused > wondering why someone would want to do that. > > * Also consider a more practical example? I have added a more practical example with explanation. > * It appears rules IS NULL behaves differently from rules=''. Is that > desired? For instance: > create collation c1(provider=icu, > locale='und-u-ka-shifted-ks-level1', > deterministic=false); > create collation c2(provider=icu, > locale='und-u-ka-shifted-ks-level1', > rules='', > deterministic=false); > select 'a b' collate c1 = 'ab' collate c1; -- true > select 'a b' collate c2 = 'ab' collate c2; -- false I'm puzzled by this. The general behavior is, extract the rules of the original locale, append the custom rules, use that. If the custom rules are the empty string, that should match using the original rules untouched. Needs further investigation. > * Can you document the interaction between locale keywords > ("@colStrength=primary") and a rule like '[strength 2]'? I'll look into that.
On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 08.03.23 21:57, Jeff Davis wrote: > > > * It appears rules IS NULL behaves differently from rules=''. Is that > > desired? For instance: > > create collation c1(provider=icu, > > locale='und-u-ka-shifted-ks-level1', > > deterministic=false); > > create collation c2(provider=icu, > > locale='und-u-ka-shifted-ks-level1', > > rules='', > > deterministic=false); > > select 'a b' collate c1 = 'ab' collate c1; -- true > > select 'a b' collate c2 = 'ab' collate c2; -- false > > I'm puzzled by this. The general behavior is, extract the rules of the > original locale, append the custom rules, use that. If the custom rules > are the empty string, that should match using the original rules > untouched. Needs further investigation. > > > * Can you document the interaction between locale keywords > > ("@colStrength=primary") and a rule like '[strength 2]'? > > I'll look into that. > This thread is listed on PostgreSQL 16 Open Items list. This is a gentle reminder to see if there is a plan to move forward with respect to open points. -- With Regards, Amit Kapila.
On 24.07.23 04:46, Amit Kapila wrote: > On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: >> >> On 08.03.23 21:57, Jeff Davis wrote: >> >>> * It appears rules IS NULL behaves differently from rules=''. Is that >>> desired? For instance: >>> create collation c1(provider=icu, >>> locale='und-u-ka-shifted-ks-level1', >>> deterministic=false); >>> create collation c2(provider=icu, >>> locale='und-u-ka-shifted-ks-level1', >>> rules='', >>> deterministic=false); >>> select 'a b' collate c1 = 'ab' collate c1; -- true >>> select 'a b' collate c2 = 'ab' collate c2; -- false >> >> I'm puzzled by this. The general behavior is, extract the rules of the >> original locale, append the custom rules, use that. If the custom rules >> are the empty string, that should match using the original rules >> untouched. Needs further investigation. >> >>> * Can you document the interaction between locale keywords >>> ("@colStrength=primary") and a rule like '[strength 2]'? >> >> I'll look into that. > > This thread is listed on PostgreSQL 16 Open Items list. This is a > gentle reminder to see if there is a plan to move forward with respect > to open points. I have investigated this. My assessment is that how PostgreSQL interfaces with ICU is correct. Whether what ICU does is correct might be debatable. I have filed a bug with ICU about this: https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no response yet. You can work around this by including the desired attributes in the rules string, for example create collation c3 (provider=icu, locale='und-u-ka-shifted-ks-level1', rules='[alternate shifted][strength 1]', deterministic=false); So I don't think there is anything we need to do here for PostgreSQL 16.
On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote: > I have investigated this. My assessment is that how PostgreSQL > interfaces with ICU is correct. Whether what ICU does is correct > might > be debatable. I have filed a bug with ICU about this: > https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no > response yet. Is everything other than the language and region simply discarded when a rules string is present, or are some attributes preserved, or is there some other nuance? > You can work around this by including the desired attributes in the > rules string, for example > > create collation c3 (provider=icu, > locale='und-u-ka-shifted-ks-level1', > rules='[alternate shifted][strength 1]', > deterministic=false); > > So I don't think there is anything we need to do here for PostgreSQL > 16. Is there some way we can warn a user that some attributes will be discarded, or improve the documentation? Letting the user figure this out for themselves doesn't seem right. Are you sure we want to allow rules for the database default collation in 16, or should we start with just allowing them in CREATE COLLATION and then expand to the database default collation later? I'm still a bit concerned about users getting too fancy with daticurules, and ending up not being able to connect to their database anymore. Regards, Jeff Davis
On Tue, Aug 22, 2023 at 10:55 PM Jeff Davis <pgsql@j-davis.com> wrote: > > On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote: > > I have investigated this. My assessment is that how PostgreSQL > > interfaces with ICU is correct. Whether what ICU does is correct > > might > > be debatable. I have filed a bug with ICU about this: > > https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no > > response yet. > > Is everything other than the language and region simply discarded when > a rules string is present, or are some attributes preserved, or is > there some other nuance? > > > You can work around this by including the desired attributes in the > > rules string, for example > > > > create collation c3 (provider=icu, > > locale='und-u-ka-shifted-ks-level1', > > rules='[alternate shifted][strength 1]', > > deterministic=false); > > > > So I don't think there is anything we need to do here for PostgreSQL > > 16. > > Is there some way we can warn a user that some attributes will be > discarded, or improve the documentation? Letting the user figure this > out for themselves doesn't seem right. > > Are you sure we want to allow rules for the database default collation > in 16, or should we start with just allowing them in CREATE COLLATION > and then expand to the database default collation later? I'm still a > bit concerned about users getting too fancy with daticurules, and > ending up not being able to connect to their database anymore. > There is still an Open Item corresponding to this. Does anyone else want to weigh in? -- With Regards, Amit Kapila.