Home > mailing lists

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

From	Hugh Ranalli
Subject	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date	December 18, 2018 16:01:00
Msg-id	CAAhbUMMzPERSe3KfKKQfR4COJCZSrss1G7KRyUraYJyvrVyOUg@mail.gmail.com Whole thread Raw
In response to	Re: BUG #15548: Unaccent does not remove combining diacritical characters (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: BUG #15548: Unaccent does not remove combining diacritical characters
List	pgsql-bugs

Tree view

On Mon, 17 Dec 2018 at 23:05, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

+ʹ '
+ʺ "
+ʻ '
+ʼ '
+ʽ '
+˂ <
+˃ >
+˄ ^
+ˆ ^
+ˈ '
+ˋ `
+ː :
+˖ +
+˗ -
+˜ ~

These aren't the combining codepoints. They're new substitutions defined in r34 of the Latin-ASCII transliteration file. I had wondered about those, too, and did some testing.

I don't think this is quite right.

However, you are correct that something isn't write. In testing why I was getting a different output, I had reverted to the generate_unaccent_rules.py BEFORE my changes. And then I applied my update for the transliteration file format to the reverted version. The patch for generate_unaccent_rules should still be good, but the generated rules file didn't include the combining diacriticals. In generating that, I want to double check some of the additions before re-submitting.

On Mon, 17 Dec 2018 at 23:57, Michael Paquier <michael@paquier.xyz> wrote:

Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
the same time? That would be nice to check easily the extent of the
patches proposed on this thread.

That makes sense. I'm happy to do that. Let me look at that file and see how extensive the other changes (encoding and removal of special characters would be).

Hugh

pgsql-bugs by date:

From: Etsuro Fujita
Date: 18 December 2018, 15:48:59
Subject: Re: BUG #15552: Unexpected error in COPY to a foreign table in atransaction

From: Luis Carril
Date: 18 December 2018, 16:41:04
Subject: Re: BUG #15552: Unexpected error in COPY to a foreign table in atransaction

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

Previous

Next