Michael Paquier <michael@paquier.xyz> writes:
> Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
> the same time? That would be nice to check easily the extent of the
> patches proposed on this thread.
I wonder why unaccent.sql is set up to run its tests in KOI8 client
encoding rather than UTF8. It doesn't seem like it's the business
of this test script to be verifying transcoding from KOI8 to UTF8
(and if it were meant to do that, it's a pretty incomplete test...).
But having it set up like that means that we can't directly add
such tests to unaccent.sql, because there are no combining diacritics
in the KOI8 character set. We have two unattractive options:
* Change client encodings partway through unaccent.sql. I think this
would be disastrous for editability of that file; no common tools
will understand the encoding change.
* Put the new test cases into a separate file with a different client
encoding. This is workable, I suppose, but it seems pretty silly
when the tests are only a few queries apiece.
Another problem I've got with the current setup is that it seems
unlikely that many people's editors default to an assumption of
KOI8 encoding. Mine guesses that these files are UTF8, and so
the test cases look perfectly insane. They do make sense if
I transcode the files to UTF8, but I wonder why we're not shipping
them as UTF8 in the first place.
tl;dr: I think we should convert unaccent.sql and unaccent.out
to UTF8 encoding. Then, adding more test cases for this patch
will be easy.
regards, tom lane