Thread: a couple questions about convert()

a couple questions about convert()

From
smcg2297@frii.com
Date:
Hello,

In a postgresql-9.3.1 database with UTF8 encoding I can do:

  select convert_from (E'\\x68656c6c6f', 'LATIN1');
   convert_from
  --------------
   hello

But when I explicitly give the "to" encoding:

  select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
     convert
  --------------
   \x68656c6c6f

Why does that second one give different results from the first?

Second question: why is that none of the convert* functions are
marked as immutable (thus preventing me from creating a functional
index using them).  Surely if I convert \x68 to utf-8 the result
will *always* be "h", won't it?



Re: a couple questions about convert()

From
Tom Lane
Date:
smcg2297@frii.com writes:
> In a postgresql-9.3.1 database with UTF8 encoding I can do:

>   select convert_from (E'\\x68656c6c6f', 'LATIN1');
>    convert_from
>   --------------
>    hello

> But when I explicitly give the "to" encoding:

>   select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
>      convert
>   --------------
>    \x68656c6c6f

> Why does that second one give different results from the first?

convert_from() produces a result of type text.  But convert() returns
bytea, because its output is not necessarily valid in the current
database encoding.  It's the same bytes, but it prints differently.
(You could ameliorate the unreadability by changing the bytea_output
setting.)

> Second question: why is that none of the convert* functions are
> marked as immutable (thus preventing me from creating a functional
> index using them).  Surely if I convert \x68 to utf-8 the result
> will *always* be "h", won't it?

Well, no, it'll be whatever the conversion function says it is;
and encoding conversion functions are replaceable through DDL.

I recall some discussion to the effect that that was silly and
we should rip out CREATE CONVERSION and friends in favor of
hard-wired conversion rules.  Nothing's been done about it though.

            regards, tom lane


Re: a couple questions about convert()

From
smcg2297@frii.com
Date:
On 09/19/2014 08:50 AM, Tom Lane wrote:
> smcg2297@frii.com writes:
>> In a postgresql-9.3.1 database with UTF8 encoding I can do:
>
>>   select convert_from (E'\\x68656c6c6f', 'LATIN1');
>>    convert_from
>>   --------------
>>    hello
>
>> But when I explicitly give the "to" encoding:
>
>>   select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
>>      convert
>>   --------------
>>    \x68656c6c6f
>
>> Why does that second one give different results from the first?
>
> convert_from() produces a result of type text.  But convert() returns
> bytea, because its output is not necessarily valid in the current
> database encoding.  It's the same bytes, but it prints differently.
> (You could ameliorate the unreadability by changing the bytea_output
> setting.)

I my case I routinely deal with multi-byte utf-8 characters so the
"escape" format doesn't help much.

>> Second question: why is that none of the convert* functions are
>> marked as immutable (thus preventing me from creating a functional
>> index using them).  Surely if I convert \x68 to utf-8 the result
>> will *always* be "h", won't it?
>
> Well, no, it'll be whatever the conversion function says it is;
> and encoding conversion functions are replaceable through DDL.
>
> I recall some discussion to the effect that that was silly and
> we should rip out CREATE CONVERSION and friends in favor of
> hard-wired conversion rules.  Nothing's been done about it though.

Well, if it matters, count me as +1.

Thanks very much for the explanation, things are clearer now. :-)