Craig Ringer <craig@2ndquadrant.com> writes:
> Interesting issue. Mainly because the "ť" char it complains about
> (utf-8 0xc5 0xa5) is accepted in the SELECT that generates the record.
Uh, no, actually it's the SELECT that's failing.
> The regress script in question sets:
> SET client_encoding = 'utf8';
> but we're apparently round-tripping the data through the database encoding
> at some point, then converting back to client_encoding for output.
The conversion to DB encoding will happen the instant the query string
reaches the database. You can set client_encoding to whatever you want,
but the only characters that can appear in queries are those that exist
in both the client encoding and the database encoding.
> In some ways it seems like the argument to pg_logical_emit_message(...) should
> be 'bytea'. That'd be much more convenient for application use. But then
> it's a pain when using it via the text-format SQL interface calls, where
> we've got no sensible way to output it.
Well, that's something worth thinking about. I assume that
pg_logical_slot_get_changes could be executed in a database different from
the one where a change was originated? What's going to happen if a string
in WAL contains characters unrepresentable in that database? Do we even
have logic in there that will attempt to perform the necessary conversion?
And it is *necessary*, not optional, if you are going to claim that the
output of pg_logical_slot_get_changes is of type text.
regards, tom lane