Re: BUG #18362: unaccent rules and Old Greek text - Mailing list pgsql-bugs

From Peter Eisentraut
Subject Re: BUG #18362: unaccent rules and Old Greek text
Date
Msg-id 1bcd13b7-6e00-4de1-961e-b7669f05a2da@eisentraut.org
Whole thread Raw
In response to Re: BUG #18362: unaccent rules and Old Greek text  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: BUG #18362: unaccent rules and Old Greek text
List pgsql-bugs
On 14.05.24 16:51, Robert Haas wrote:
> 2. The question of which mappings we actually ought to be adding seems
> a lot harder, because it's not altogether clear what it means to
> "remove an accent". The proposed patch adds a whole lot of rules that
> turn tiny little characters into full-sized characters, boldfaced
> and/or italicized and/or otherwise-fancily-printed characters into
> full-sized characters. Only a handful of the changes are actually
> adding rules that specifically*remove an accent*, but there are
> similar rules that already exist, like turning ⅐ into the
> four-character sequence " 1/7" and blocky-looking versions of each
> letter into standard versions and ㍱ into the three-character sequence
> "hPa". So my naive guess would be that we want all of these rules,
> even though you would not guess from the unaccent documentation that
> it's supposed to do stuff like this.

unaccent actually does both accent removal and ligature expansion. 
(This is documented.)  The cases you show above are ligature expansions.

You can also run generate_unaccent_rules.py with --no-ligatures and then 
you get a smaller list that indeed looks more like just accent removal.

It does look like that whatever it thinks a ligature is has some 
unintuitive results.



pgsql-bugs by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: BUG #18362: unaccent rules and Old Greek text
Next
From: PG Bug reporting form
Date:
Subject: BUG #18467: postgres_fdw (deparser) ignores LimitOption