Thread: a couple questions about convert()
Hello, In a postgresql-9.3.1 database with UTF8 encoding I can do: select convert_from (E'\\x68656c6c6f', 'LATIN1'); convert_from -------------- hello But when I explicitly give the "to" encoding: select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8'); convert -------------- \x68656c6c6f Why does that second one give different results from the first? Second question: why is that none of the convert* functions are marked as immutable (thus preventing me from creating a functional index using them). Surely if I convert \x68 to utf-8 the result will *always* be "h", won't it?
smcg2297@frii.com writes: > In a postgresql-9.3.1 database with UTF8 encoding I can do: > select convert_from (E'\\x68656c6c6f', 'LATIN1'); > convert_from > -------------- > hello > But when I explicitly give the "to" encoding: > select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8'); > convert > -------------- > \x68656c6c6f > Why does that second one give different results from the first? convert_from() produces a result of type text. But convert() returns bytea, because its output is not necessarily valid in the current database encoding. It's the same bytes, but it prints differently. (You could ameliorate the unreadability by changing the bytea_output setting.) > Second question: why is that none of the convert* functions are > marked as immutable (thus preventing me from creating a functional > index using them). Surely if I convert \x68 to utf-8 the result > will *always* be "h", won't it? Well, no, it'll be whatever the conversion function says it is; and encoding conversion functions are replaceable through DDL. I recall some discussion to the effect that that was silly and we should rip out CREATE CONVERSION and friends in favor of hard-wired conversion rules. Nothing's been done about it though. regards, tom lane
On 09/19/2014 08:50 AM, Tom Lane wrote: > smcg2297@frii.com writes: >> In a postgresql-9.3.1 database with UTF8 encoding I can do: > >> select convert_from (E'\\x68656c6c6f', 'LATIN1'); >> convert_from >> -------------- >> hello > >> But when I explicitly give the "to" encoding: > >> select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8'); >> convert >> -------------- >> \x68656c6c6f > >> Why does that second one give different results from the first? > > convert_from() produces a result of type text. But convert() returns > bytea, because its output is not necessarily valid in the current > database encoding. It's the same bytes, but it prints differently. > (You could ameliorate the unreadability by changing the bytea_output > setting.) I my case I routinely deal with multi-byte utf-8 characters so the "escape" format doesn't help much. >> Second question: why is that none of the convert* functions are >> marked as immutable (thus preventing me from creating a functional >> index using them). Surely if I convert \x68 to utf-8 the result >> will *always* be "h", won't it? > > Well, no, it'll be whatever the conversion function says it is; > and encoding conversion functions are replaceable through DDL. > > I recall some discussion to the effect that that was silly and > we should rip out CREATE CONVERSION and friends in favor of > hard-wired conversion rules. Nothing's been done about it though. Well, if it matters, count me as +1. Thanks very much for the explanation, things are clearer now. :-)