Home > mailing lists

Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters - Mailing list pgsql-bugs

From	Daniel Verite
Subject	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
Date	December 13, 2018 16:26:48
Msg-id	5d77cc08-d582-4f83-a17f-f2c992d123a9@manitou-mail.org Whole thread Raw
In response to	Re: BUG #15548: Unaccent does not remove combining diacritical characters (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: BUG #15548: Unaccent does not remove combining diacritical characters
List	pgsql-bugs

Tree view

    Tom Lane wrote:

> Hm, I thought the OP's proposal was just to make unaccent drop
> combining diacriticals independently of context, which'd avoid the
> combinatorial-growth problem.

In that case, this could be achieved by simply appending the
diacriticals themselves to unaccent.rules, since replacement of a
string by an empty string is already supported as a rule.
It doesn't seem like the current file has any of these, but from
https://www.postgresql.org/docs/11/unaccent.html :

 "Alternatively, if only one character is given on a line, instances
 of that character are deleted; this is useful in languages where
 accents are represented by separate characters"

Incidentally we may want to improve this bit of doc to mention
explicitly the Unicode decomposed forms as a use case for
removing characters. In fact I wonder if that's not what it's
already trying to express, but confusing "languages" with "forms".

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

pgsql-bugs by date:

From: Juan Toro
Date: 13 December 2018, 16:21:43
Subject: problema version 10.6

From: Hugh Ranalli
Date: 13 December 2018, 18:50:37
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters

Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters - Mailing list pgsql-bugs

Previous

Next