Thread: UTF-8 to ASCII

UTF-8 to ASCII

From
Martin Marques
Date:
I have a doubt about the function to_ascii() and what the documentation
says.

Basically, I passed my DB from latin1 to UTF-8, and I started getting an
error when using the to_ascii() function on a field of one of my DB [1]:

ERROR:  la conversión de codificación de UTF8 a ASCII no está soportada

OK, it's in spanish, but basically it says that the conversion UTF8 to
ASCII is not supported, but in the documentation [2] I see this in the
"Table 9-7. Built-in Conversions":

utf8_to_ascii    UTF8    SQL_ASCII

Is the documentation wrong or something?

I'm on postgresql-8.1.8, and as you can see, I'm checking the
corresponding documentation.

[1]: This I already solved using convert() to pass from UTF to Latin1,
and after that I do a to_ascii().
[2]:
http://www.postgresql.org/docs/8.1/interactive/functions-string.html#FTN.AEN7625

--
  21:50:04 up 2 days,  9:07,  0 users,  load average: 0.92, 0.37, 0.18
---------------------------------------------------------
Lic. Martín Marqués         |   SELECT 'mmarques' ||
Centro de Telemática        |       '@' || 'unl.edu.ar';
Universidad Nacional        |   DBA, Programador,
     del Litoral             |   Administrador
---------------------------------------------------------

Re: UTF-8 to ASCII

From
LEGEAY Jérôme
Date:
for convert my DB, i use this process:


createdb -T "old_DB" "copy_old_DB"
dropdb "old_DB"
createdb -E LATIN1 -T "copy_old_DB" "new_DB_name"

maybe this process will help you.

regards

Jérôme LEGEAY

Le 14:13 11/05/2007, vous avez écrit:
>I have a doubt about the function to_ascii() and what the documentation says.
>
>Basically, I passed my DB from latin1 to UTF-8, and I started getting an
>error when using the to_ascii() function on a field of one of my DB [1]:
>
>ERROR:  la conversión de codificación de UTF8 a ASCII no está soportada
>
>OK, it's in spanish, but basically it says that the conversion UTF8 to
>ASCII is not supported, but in the documentation [2] I see this in the
>"Table 9-7. Built-in Conversions":
>
>utf8_to_ascii   UTF8    SQL_ASCII
>
>Is the documentation wrong or something?
>
>I'm on postgresql-8.1.8, and as you can see, I'm checking the
>corresponding documentation.
>
>[1]: This I already solved using convert() to pass from UTF to Latin1, and
>after that I do a to_ascii().
>[2]:
>http://www.postgresql.org/docs/8.1/interactive/functions-string.html#FTN.AEN7625
>
>--
>  21:50:04 up 2 days,  9:07,  0 users,  load average: 0.92, 0.37, 0.18
>---------------------------------------------------------
>Lic. Martín Marqués         |   SELECT 'mmarques' ||
>Centro de Telemática        |       '@' || 'unl.edu.ar';
>Universidad Nacional        |   DBA, Programador,
>     del Litoral             |   Administrador
>---------------------------------------------------------
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org/
>



Re: UTF-8 to ASCII

From
Martin Marques
Date:
LEGEAY Jérôme wrote:
> for convert my DB, i use this process:
>
>
> createdb -T "old_DB" "copy_old_DB"
> dropdb "old_DB"
> createdb -E LATIN1 -T "copy_old_DB" "new_DB_name"
>
> maybe this process will help you.

As I said in my original mail, the DB conversion went OK, but I see some
discrepancies in the documentation.

My question is if the documentation is correct, and if so, why don't I
get the right behavior?


--
  21:50:04 up 2 days,  9:07,  0 users,  load average: 0.92, 0.37, 0.18
---------------------------------------------------------
Lic. Martín Marqués         |   SELECT 'mmarques' ||
Centro de Telemática        |       '@' || 'unl.edu.ar';
Universidad Nacional        |   DBA, Programador,
     del Litoral             |   Administrador
---------------------------------------------------------

Re: UTF-8 to ASCII

From
Arnaud Lesauvage
Date:
Martin Marques a écrit :
> I have a doubt about the function to_ascii() and what the documentation
> says.
>
> Basically, I passed my DB from latin1 to UTF-8, and I started getting an
> error when using the to_ascii() function on a field of one of my DB [1]:
>
> ERROR:  la conversión de codificación de UTF8 a ASCII no está soportada
>
> OK, it's in spanish, but basically it says that the conversion UTF8 to
> ASCII is not supported, but in the documentation [2] I see this in the
> "Table 9-7. Built-in Conversions":
>
> utf8_to_ascii    UTF8    SQL_ASCII
>
> Is the documentation wrong or something?

Hi Martin,
I think the documentation of 8.1 is wrong.
It looks different indocumentation of 8.2 :
to_ascii : Convert string to ASCII from another encoding *(only supports conversion from LATIN1, LATIN2, LATIN9, and
WIN1250encodings)* 

Hi ran into this problem too, and I wrote a function that converts from DB encoding to LATIN9 before doing the to_ascii
conversion: /to_ascii(convert(mystring, 'LATIN9'), 'LATIN9')/ 

Regards
--
Arnaud

Re: UTF-8 to ASCII

From
"Albe Laurenz"
Date:
> I have a doubt about the function to_ascii() and what the
> documentation says.
>
> Basically, I passed my DB from latin1 to UTF-8, and I started

What do you mean by 'passed the DB from Latin1 to UTF8'?

> getting an error when using the to_ascii() function on a field
> of one of my DB [1]:
>
> ERROR:  la conversión de codificación de UTF8 a ASCII no está soportada
>
> OK, it's in spanish, but basically it says that the conversion
> UTF8 to ASCII is not supported, but in the documentation [2] I see
> this in the "Table 9-7. Built-in Conversions":
>
> utf8_to_ascii    UTF8    SQL_ASCII
>
> Is the documentation wrong or something?
>
> I'm on postgresql-8.1.8, and as you can see, I'm checking the
> corresponding documentation.
>
> [1]: This I already solved using convert() to pass from UTF
> to Latin1, and after that I do a to_ascii().
> [2]:
> http://www.postgresql.org/docs/8.1/interactive/functions-string.html#FTN.AEN7625

Well, the documentation for to_ascii states clearly:
  "The to_ascii function supports conversion from LATIN1, LATIN2,
   LATIN9, and WIN1250 encodings only."

The table of conversions you quote belongs to the function convert().

So that should answer your question.

I am not sure what you are trying to achieve.
If you tell us, I might be able to tell you HOW to achieve it.

Yours,
Laurenz Albe

Re: UTF-8 to ASCII

From
Martin Marques
Date:
Albe Laurenz wrote:

>> [2]:
>> http://www.postgresql.org/docs/8.1/interactive/functions-string.html#FTN.AEN7625
>
> Well, the documentation for to_ascii states clearly:
>   "The to_ascii function supports conversion from LATIN1, LATIN2,
>    LATIN9, and WIN1250 encodings only."

Sorry, didn't see the footnote on the table.

--
  21:50:04 up 2 days,  9:07,  0 users,  load average: 0.92, 0.37, 0.18
---------------------------------------------------------
Lic. Martín Marqués         |   SELECT 'mmarques' ||
Centro de Telemática        |       '@' || 'unl.edu.ar';
Universidad Nacional        |   DBA, Programador,
     del Litoral             |   Administrador
---------------------------------------------------------

Re: UTF-8 to ASCII

From
Alvaro Herrera
Date:
Martin Marques escribió:
> I have a doubt about the function to_ascii() and what the documentation
> says.
>
> Basically, I passed my DB from latin1 to UTF-8, and I started getting an
> error when using the to_ascii() function on a field of one of my DB [1]:
>
> ERROR:  la conversión de codificación de UTF8 a ASCII no está soportada

Well, the to_ascii() documentation says that it only supports LATIN1,
LATIN2, LATIN9, and WIN1250.  This is on a footnote.

I do think that there's something strange on the vicinity anyway,
because using convert() expliciting the conversion function gives a
mismatching error for me (local environment is UTF8, as is
client_encoding):

alvherre=# select convert('Martín' using utf8_to_ascii);
ERROR:  character 0xc3 of encoding "MULE_INTERNAL" has no equivalent in "SQL_ASCII"


Why on earth is it talking about MULE_INTERNAL?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: UTF-8 to ASCII

From
"Martin Gainty"
Date:
Apparently you will need to implement a UNICODE aware JDBC driver
http://archives.postgresql.org/pgsql-general/2004-01/msg01649.php
Martín

This email message and any files transmitted with it contain confidential
information intended only for the person(s) to whom this email message is
addressed.  If you have received this email message in error, please notify
the sender immediately by telephone or email and destroy the original
message without making a copy.  Thank you.

----- Original Message -----
From: "Alvaro Herrera" <alvherre@commandprompt.com>
To: "Martin Marques" <martin@bugs.unl.edu.ar>
Cc: <pgsql-general@postgresql.org>
Sent: Friday, May 11, 2007 9:33 AM
Subject: Re: [GENERAL] UTF-8 to ASCII


> Martin Marques escribió:
>> I have a doubt about the function to_ascii() and what the documentation
>> says.
>>
>> Basically, I passed my DB from latin1 to UTF-8, and I started getting an
>> error when using the to_ascii() function on a field of one of my DB [1]:
>>
>> ERROR:  la conversión de codificación de UTF8 a ASCII no está soportada
>
> Well, the to_ascii() documentation says that it only supports LATIN1,
> LATIN2, LATIN9, and WIN1250.  This is on a footnote.
>
> I do think that there's something strange on the vicinity anyway,
> because using convert() expliciting the conversion function gives a
> mismatching error for me (local environment is UTF8, as is
> client_encoding):
>
> alvherre=# select convert('Martín' using utf8_to_ascii);
> ERROR:  character 0xc3 of encoding "MULE_INTERNAL" has no equivalent in
> "SQL_ASCII"
>
>
> Why on earth is it talking about MULE_INTERNAL?
>
> --
> Alvaro Herrera
> http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>


Re: UTF-8 to ASCII

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Why on earth is it talking about MULE_INTERNAL?

IIRC, a lot of the conversions translate through some common
intermediate charset to save on code/table space.  In such cases
the problem will usually be detected on the backend conversion...

            regards, tom lane

Re: UTF-8 to ASCII

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Why on earth is it talking about MULE_INTERNAL?
>
> IIRC, a lot of the conversions translate through some common
> intermediate charset to save on code/table space.  In such cases
> the problem will usually be detected on the backend conversion...

Interesting, but it doesn't explain why the conversion doesn't work.
AFAICS the operation I am requesting is valid.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.