Home > mailing lists

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

From	Tom Lane
Subject	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date	December 18, 2018 05:36:02
Msg-id	8506.1545111362@sss.pgh.pa.us Whole thread Raw
In response to	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters (Michael Paquier <michael@paquier.xyz>)
Responses	Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
List	pgsql-bugs

Tree view

Michael Paquier <michael@paquier.xyz> writes:
> Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
> the same time?  That would be nice to check easily the extent of the
> patches proposed on this thread.

I wonder why unaccent.sql is set up to run its tests in KOI8 client
encoding rather than UTF8.  It doesn't seem like it's the business
of this test script to be verifying transcoding from KOI8 to UTF8
(and if it were meant to do that, it's a pretty incomplete test...).
But having it set up like that means that we can't directly add
such tests to unaccent.sql, because there are no combining diacritics
in the KOI8 character set.  We have two unattractive options:

* Change client encodings partway through unaccent.sql.  I think this
would be disastrous for editability of that file; no common tools
will understand the encoding change.

* Put the new test cases into a separate file with a different client
encoding.  This is workable, I suppose, but it seems pretty silly
when the tests are only a few queries apiece.

Another problem I've got with the current setup is that it seems
unlikely that many people's editors default to an assumption of
KOI8 encoding.  Mine guesses that these files are UTF8, and so
the test cases look perfectly insane.  They do make sense if
I transcode the files to UTF8, but I wonder why we're not shipping
them as UTF8 in the first place.

tl;dr: I think we should convert unaccent.sql and unaccent.out
to UTF8 encoding.  Then, adding more test cases for this patch
will be easy.

            regards, tom lane

pgsql-bugs by date:

From: Michael Paquier
Date: 18 December 2018, 05:04:19
Subject: Re: BUG #15552: Unexpected error in COPY to a foreign table in atransaction

From: Amit Langote
Date: 18 December 2018, 05:51:10
Subject: Re: BUG #15552: Unexpected error in COPY to a foreign table in atransaction

Re: BUG #15548: Unaccent does not remove combining diacritical characters - Mailing list pgsql-bugs

Previous

Next