Home > mailing lists

Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters - Mailing list pgsql-bugs

From	Michael Paquier
Subject	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
Date	December 18, 2018 07:57:08
Msg-id	20181218045708.GI1532@paquier.xyz Whole thread Raw
In response to	Re: BUG #15548: Unaccent does not remove combining diacritical characters (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: BUG #15548: Unaccent does not remove combining diacritical characters
List	pgsql-bugs

Tree view

On Tue, Dec 18, 2018 at 03:05:00PM +1100, Thomas Munro wrote:
> I don't think this is quite right.  Those don't seem to be the
> combining codepoints[1], and in any case they are being replaced with
> ASCII characters, whereas I thought we wanted to replace them with
> nothing at all.  Here is my attempt to come up with a test case using
> combining characters:
>
>   select unaccent('un café crème s''il vous plaît');
>
> It's not stripping the accents.  I've attached that in a file for
> reference so you can run it with psql -f x.sql, and you can see that
> it's using combining code points (code points 0301, 0300, 0302 which
> come out as cc81, cc80, cc82 in UTF-8) like so:

Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
the same time?  That would be nice to check easily the extent of the
patches proposed on this thread.
--
Michael

Attachment

signature.asc

pgsql-bugs by date:

From: Thomas Munro
Date: 18 December 2018, 07:10:25
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters

From: Michael Paquier
Date: 18 December 2018, 08:04:19
Subject: Re: BUG #15552: Unexpected error in COPY to a foreign table in atransaction

Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters - Mailing list pgsql-bugs

Attachment

Previous

Next