Re: convert text field to utf8 in sql_ascii database - Mailing list pgsql-general

From Albe Laurenz
Subject Re: convert text field to utf8 in sql_ascii database
Date
Msg-id D960CB61B694CF459DCFB4B0128514C207201101@exadv11.host.magwien.gv.at
Whole thread Raw
In response to convert text field to utf8 in sql_ascii database  (Andy Colson <andy@squeakycode.net>)
List pgsql-general
Andy Colson wrote:
> I am in the middle of a process to get all my data into utf8.  As its
> not all converted yet, my database encoding is SQL_ASCII.
>
> I am getting external apps fixed up to write utf8 to the database, and
> so far so good.  But, I ran across some stuff that needs a one time
> convert, and wanted to just write sql to update it.
>
> I cant seem to figure it out.  I'm on Slackware 64, PG 9.0.4.
>
> I tried:
> update general set legal = convert(legal, 'windows_1250', 'utf8');
>
> ERROR:  function convert(text, unknown, unknown) does not exist
> HINT:  No function matches the given name and argument types. You might
> need to add explicit type casts.
>
>
> It looks like convert() only does bytea, and legal is defined as text.
> Can't seem to cast it either (as legal::bytea).  The result is bytea, so
> again would have to cast/convert?

Right.

> Also, I'd like to add, I dont understand the online help:
> http://www.postgresql.org/docs/9.0/static/functions-string.html
>
> shows convert(string, src, dest), but then says see Table 9-7, which has
> conversion names, like windows_1250_to_utf8, where do I use that?

You can only convert from encoding A to encoding B if there is a
corresponding A_to_B conversion available.

> The help also shows functions convert_to() and convert_from().  But I
> dont understand how to use them.
>
> update general set legal = convert_to(legal, 'utf8');
>
> How does it know what source encoding to use?  And it converts to bytea
> so how do I convert it back to text?

convert_to() and convert_from() can't do any conversion in an SQL_ASCII
database because there is no well-defined database encoding in that case.
They will "cast" between text and bytea but leave the contents unmodified.

You'll have to employ convert() to do the actual conversion.

If, for example, you know that a column is in WIN1252 encoding, you could
  UPDATE tab SET col =
    convert_from(
      convert(
        convert_to('schön', 'WIN1252'), 'WIN1252', 'UTF8'
      ), 'UTF8'
    );

Yours,
Laurenz Albe

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: pg_dump error
Next
From: Greg Smith
Date:
Subject: Re: Performance degradation 8.4 -> 9.1