Home > mailing lists

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

From	Tom Lane
Subject	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date	December 13, 2018 15:05:42
Msg-id	10200.1544713542@sss.pgh.pa.us Whole thread Raw
In response to	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters ("Daniel Verite" <daniel@manitou-mail.org>)
Responses	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
List	pgsql-bugs

Tree view

"Daniel Verite" <daniel@manitou-mail.org> writes:
>     PG Bug reporting form wrote:
>> ... For example, A
>> followed by U+0300 displays À. However, unaccent is not removing
>> these accents.

> Short of having the input normalized by the application, ISTM that the
> best solution would be to provide functions to do it in Postgres, so
> you'd just write for example:
>     unaccent(unicode_NFC(string))

That might be worthwhile, but it seems independent of this issue.

> Otherwise unaccent.rules can be customized. You may add replacements
> for letter+diacritical sequences that are missing for the languages
> you have to deal with. But doing it in general for all diacriticals
> multiplied by all base characters seems unrealistic.

Hm, I thought the OP's proposal was just to make unaccent drop 
combining diacriticals independently of context, which'd avoid the
combinatorial-growth problem.

            regards, tom lane

pgsql-bugs by date:

From: "Daniel Verite"
Date: 13 December 2018, 13:19:51
Subject: Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters

From: Juan Toro
Date: 13 December 2018, 16:21:43
Subject: problema version 10.6

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

Previous

Next