"Daniel Verite" <daniel@manitou-mail.org> writes:
> PG Bug reporting form wrote:
>> ... For example, A
>> followed by U+0300 displays À. However, unaccent is not removing
>> these accents.
> Short of having the input normalized by the application, ISTM that the
> best solution would be to provide functions to do it in Postgres, so
> you'd just write for example:
> unaccent(unicode_NFC(string))
That might be worthwhile, but it seems independent of this issue.
> Otherwise unaccent.rules can be customized. You may add replacements
> for letter+diacritical sequences that are missing for the languages
> you have to deal with. But doing it in general for all diacriticals
> multiplied by all base characters seems unrealistic.
Hm, I thought the OP's proposal was just to make unaccent drop
combining diacriticals independently of context, which'd avoid the
combinatorial-growth problem.
regards, tom lane