Re: BUG #18362: unaccent rules and Old Greek text - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #18362: unaccent rules and Old Greek text
Date
Msg-id CA+hUKGJmgaxpNn5x1Po1kmUxDiojsYWVWKKvhX+4QnyjDCWKKQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18362: unaccent rules and Old Greek text  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: BUG #18362: unaccent rules and Old Greek text
Re: BUG #18362: unaccent rules and Old Greek text
List pgsql-bugs
On Thu, May 16, 2024 at 1:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 15, 2024 at 2:45 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> > On 14.05.24 16:51, Robert Haas wrote:
> > The rules are only loaded once on first use, right?  I tested with
> >
> > date; for x in $(seq 1 1000); do psql -X -c "select unaccent('foobar')"
> > -o /dev/null; done; date
> >
> > and this had the same runtime (about 8 seconds here) with and without
> > the patch.
>
> Cool. Sounds like that's not a problem.

Thanks Peter for testing, and thanks Robert for kicking this thread.

> > Btw., with the patch I get
> >
> > WARNING:  duplicate source strings, first one will be used
> >
> > so it will need to adjustments in how the rules are produced.
>
> OK. Does anyone want to look into that?

I think the problem is that the new "simple redirection" rule from the
Unicode database produces some values that are also present in
Latin-ASCII.xml, and these are all tolerated as long as the "from" and
"to" strings both match, because we uniquify them as pairs.  But there
is one pair where the "to" string is different, resulting in this
clash:

ℌ      x
ℌ      H

I think the first line might actually be a bug in CLDR data.  I dunno,
but this just doesn't look right:

ℌ → x ; # 210C;BLACK-LETTER CAPITAL H (compat)

And in the tests I now see that Michael had already figured that out!
I've included a kludge to remove that.  Someone should file a ticket with CLDR.

Attachment

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Next
From: Sandeep Thakkar
Date:
Subject: Re: Issues in finding libeay.dll and ssleay.dll for win x64