Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
Date
Msg-id 20181218045708.GI1532@paquier.xyz
Whole thread Raw
In response to Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: BUG #15548: Unaccent does not remove combining diacritical characters
List pgsql-bugs
On Tue, Dec 18, 2018 at 03:05:00PM +1100, Thomas Munro wrote:
> I don't think this is quite right.  Those don't seem to be the
> combining codepoints[1], and in any case they are being replaced with
> ASCII characters, whereas I thought we wanted to replace them with
> nothing at all.  Here is my attempt to come up with a test case using
> combining characters:
>
>   select unaccent('un café crème s''il vous plaît');
>
> It's not stripping the accents.  I've attached that in a file for
> reference so you can run it with psql -f x.sql, and you can see that
> it's using combining code points (code points 0301, 0300, 0302 which
> come out as cc81, cc80, cc82 in UTF-8) like so:

Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
the same time?  That would be nice to check easily the extent of the
patches proposed on this thread.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Next
From: Michael Paquier
Date:
Subject: Re: BUG #15552: Unexpected error in COPY to a foreign table in atransaction