Home > mailing lists

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

From	Hugh Ranalli
Subject	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date	December 13, 2018 18:50:37
Msg-id	CAAhbUMOHkoN3Jeti4dp1jz3VY=XZPcCqpX=sW=mgmJbdMS--ng@mail.gmail.com Whole thread
In response to	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters ("Daniel Verite" <daniel@manitou-mail.org>)
Responses	Re: BUG #15548: Unaccent does not remove combining diacritical characters
List	pgsql-bugs

Tree view

On Thu, 13 Dec 2018, 11:26 Daniel Verite <daniel@manitou-mail.org wrote:

Tom Lane wrote:

> Hm, I thought the OP's proposal was just to make unaccent drop
> combining diacriticals independently of context, which'd avoid the
> combinatorial-growth problem.

That's what I was thinking. Given that the accent is separate from the characters, simply dropping it should result in the correct unaccented character.

In that case, this could be achieved by simply appending the
diacriticals themselves to unaccent.rules, since replacement of a
string by an empty string is already supported as a rule.
It doesn't seem like the current file has any of these, but from
https://www.postgresql.org/docs/11/unaccent.html :

"Alternatively, if only one character is given on a line, instances
of that character are deleted; this is useful in languages where
accents are represented by separate characters"

Yes, I had read that in the docs, and that's the approach I planned to take. I'll go ahead and develop a patch, then.

Best wishes,

Hugh

pgsql-bugs by date:

From: "Daniel Verite"
Date: 13 December 2018, 16:26:48
Subject: Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters

From: Stuart
Date: 13 December 2018, 22:11:43
Subject: Re: Errors creating partitioned tables from existing using (LIKE

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

Previous

Next