Committed.
https://commitfest.postgresql.org/9/468/
Interesting issue. Mainly because the "ť" char it complains about (utf-8 0xc5 0xa5) is accepted in the SELECT that generates the record. If it's valid input it should be valid output, right? We didn't change the client_encoding in the mean time. It makes sense though:
initdb on that animal says:
The database cluster will be initialized with locale "English_United States.1252".
The default database encoding has accordingly been set to "WIN1252".
The regress script in question sets:
SET client_encoding = 'utf8';
but we're apparently round-tripping the data through the database encoding at some point, then converting back to client_encoding for output.
Presumably that's when we're forming the text 'data' column in the tuplestore produced by the get changes function, which will be in the database encoding.
So setting client_encoding is not sufficient to make this work and the non-7-bit-ascii part should be removed from the test, since it's not allowed on all machines.
In some ways it seems like the argument to pg_logical_emit_message(...) should be 'bytea'. That'd be much more convenient for application use. But then it's a pain when using it via the text-format SQL interface calls, where we've got no sensible way to output it.