Home > mailing lists

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

From	Hugh Ranalli
Subject	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date	January 10, 2019 02:52:05
Msg-id	CAAhbUMNZ0ooK6SzLNdkxzdBsQHOJf_rg_EjwoNL8QHTwQuriRw@mail.gmail.com Whole thread
In response to	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters (Michael Paquier <michael@paquier.xyz>)
Responses	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
List	pgsql-bugs

Tree view

On Tue, 8 Jan 2019 at 22:53, Michael Paquier <michael@paquier.xyz> wrote:

I have been doing a bit more than a review by studying by myself the
new format and the old format, and the way we could do things in the
XML parsing part, and hacked the code by myself. On top of the
incorrect URL for Latin-ASCII.xml, I have noticed as well that there
should be only one block transforms/transform/tRule in the source, so
I think that we should add an assertion on that as a sanity check. I
have also changed the code to use splitlines(), which is more portable
across platforms, and added an extra regression test for the new
characters added to unaccent.rules. This does not close this thread
but we can support the new format this way. I have also documented
the way to browse the full set of releases for Latin-ASCII.xml, and
precisely which version has been used for this patch.

This does not close yet the part for diacritical characters, but
supporting the new format is a step into this direction. What do
you think?

HI Michael,

Thank you for putting so much effort into this. I think that looks great. When I was doing this, I discovered that I could parse both pre- and post- r29 versions, so I went with that, but I agree that there's probably no good reason to do so.

And thank you for the information on splitlines; that's a method I've overlooked. .split('\n') should be identical, if python is, as usual, compiled with universal newlines support, but it's nice to have a method guaranteed to work in all instances.

Best wishes,

Hugh

pgsql-bugs by date:

From: Masahiko Sawada
Date: 10 January 2019, 02:47:32
Subject: Re: Is temporary functions feature official/supported? Found someissues with it.

From: Tom Lane
Date: 10 January 2019, 03:29:45
Subject: Re: BUG #15577: Query returns different results when executed multiple times

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

Previous

Next