On Tue, Dec 18, 2018 at 12:36:02AM -0500, Tom Lane wrote:
> tl;dr: I think we should convert unaccent.sql and unaccent.out
> to UTF8 encoding. Then, adding more test cases for this patch
> will be easy.
Do you think that we could also remove the non-ASCII characters from the
tests? It would be easy enough to use E'\xNN' (utf8 hex) or such in
input, and show the output with bytea. That's harder to read, still we
discussed about not using UTF-8 in the python script to allow folks with
simple terminals to touch the code the last time this was touched
(5e8d670) and the characters used could be documented as comments in the
tests.
--
Michael