Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date
Msg-id CAEepm=1vRrNyam3ietQQ6ZdJ5JktkUphCEB0=_mPAKz8mjBB-A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-bugs
On Tue, Dec 18, 2018 at 3:05 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, Dec 18, 2018 at 12:03 PM Hugh Ranalli <hugh@whtc.ca> wrote:
> +ʹ    '
> +ʺ    "
> +ʻ    '
> +ʼ    '
> +ʽ    '
> +˂    <
> +˃    >
> +˄    ^
> +ˆ    ^
> +ˈ    '
> +ˋ    `
> +ː    :
> +˖    +
> +˗    -
> +˜    ~
>
> I don't think this is quite right.  Those don't seem to be the
> combining codepoints[1], and in any case they are being replaced with
> ASCII characters, whereas I thought we wanted to replace them with
> nothing at all.  Here is my attempt to come up with a test case using
> combining characters:
>
>   select unaccent('un café crème s''il vous plaît');

Oh, I see now that that was just the v34 ASCII transliteration update,
and perhaps the diacritic stripping will be posted separately.

--
Thomas Munro
http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Next
From: Michael Paquier
Date:
Subject: Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters