Thread: MULE_INTERNAL translation to win1250

MULE_INTERNAL translation to win1250

From

"NTPT"

Date:

28 January 2007, 18:49:30

Hi.

I have a strange problem  in postgres 8.1.4  (gentoo 64bit on AMD64
platform)

My database is  created  vith LATIN-2 encoding for  correct vieving of
nacional specific characters ( czech language  )

inside  code of my php application is setting client encoding to win1250
because I need output of  query in this encoding.

On some parts of data I got an error :

Query failed: ERROR: character 0x829a of encoding "MULE_INTERNAL" has no
equivalent in "WIN1250"

Without "set client_encoding to win1250" query works. I am curious why there
is a MULE_INTERNAL  mentioned even when \l+  say that corresponding database
is created with  (and even all  the cluster)  LATIN2 encoding.

 Strange enough that ALL INSERTS  are done with WIN1250 client encoding too.
May be a bug in charset translation routines of postgres ?


And how can I  repair it, preferable in whole  database ?


Thanx for help.

Re: MULE_INTERNAL translation to win1250

From

Tom Lane

Date:

28 January 2007, 19:33:24

"NTPT" <ntpt@centrum.cz> writes:
> Without "set client_encoding to win1250" query works. I am curious why there
> is a MULE_INTERNAL  mentioned even when \l+  say that corresponding database
> is created with  (and even all  the cluster)  LATIN2 encoding.

The conversions between LATIN2 and WIN1250 go by way of MULE_INTERNAL to
reduce duplication of code.  It shouldn't make any difference to the end
result though.  Are you sure that the characters you're using are
supposed to have representations in both character sets?

> May be a bug in charset translation routines of postgres ?

If you think that, you need to provide us with the exact codes that are
being mistranslated and what you think they should translate to.

            regards, tom lane

Re: MULE_INTERNAL translation to win1250

From

Tatsuo Ishii

Date:

28 January 2007, 21:32:50

MULE_INTERNAL is used for an intermediate encoding between LATIN2 and
WIN1250. The error message indicates that 0x9a of LATIN2 cannot be
mapped to WIN1250.

You can see 0x00 in the position for 0x9a (between 0x99 and 0x9b) in
the encoding map in
src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c,
which indicates nothing is corresponding to LATIN2 0x9a.  If you know
what should be mapped for LATIN2 0x9a, please let know us.

    static const unsigned char iso88592_2_win1250[] = {
        0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
        0x88, 0x89, 0x00, 0x8B, 0x00, 0x00, 0x00, 0x00,
        0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
        0x98, 0x99, 0x00, 0x9B, 0x00, 0x00, 0x00, 0x00,
        0xA0, 0xA5, 0xA2, 0xA3, 0xA4, 0xBC, 0x8C, 0xA7,
        0xA8, 0x8A, 0xAA, 0x8D, 0x8F, 0xAD, 0x8E, 0xAF,
        0xB0, 0xB9, 0xB2, 0xB3, 0xB4, 0xBE, 0x9C, 0xA1,
        0xB8, 0x9A, 0xBA, 0x9D, 0x9F, 0xBD, 0x9E, 0xBF,
        0xC0, 0xC1, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7,
        0xC8, 0xC9, 0xCA, 0xCB, 0xCC, 0xCD, 0xCE, 0xCF,
        0xD0, 0xD1, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6, 0xD7,
        0xD8, 0xD9, 0xDA, 0xDB, 0xDC, 0xDD, 0xDE, 0xDF,
        0xE0, 0xE1, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7,
        0xE8, 0xE9, 0xEA, 0xEB, 0xEC, 0xED, 0xEE, 0xEF,
        0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7,
        0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
    };
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Hi.
>
> I have a strange problem  in postgres 8.1.4  (gentoo 64bit on AMD64
> platform)
>
> My database is  created  vith LATIN-2 encoding for  correct vieving of
> nacional specific characters ( czech language  )
>
> inside  code of my php application is setting client encoding to win1250
> because I need output of  query in this encoding.
>
> On some parts of data I got an error :
>
> Query failed: ERROR: character 0x829a of encoding "MULE_INTERNAL" has no
> equivalent in "WIN1250"
>
> Without "set client_encoding to win1250" query works. I am curious why there
> is a MULE_INTERNAL  mentioned even when \l+  say that corresponding database
> is created with  (and even all  the cluster)  LATIN2 encoding.
>
>  Strange enough that ALL INSERTS  are done with WIN1250 client encoding too.
> May be a bug in charset translation routines of postgres ?
>
>
> And how can I  repair it, preferable in whole  database ?
>
>
> Thanx for help.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>

Re: MULE_INTERNAL translation to win1250

From

Michael Fuhr

Date:

28 January 2007, 22:27:27

On Sun, Jan 28, 2007 at 06:33:16PM -0500, Tom Lane wrote:
> "NTPT" <ntpt@centrum.cz> writes:
> > May be a bug in charset translation routines of postgres ?
>
> If you think that, you need to provide us with the exact codes that are
> being mistranslated and what you think they should translate to.

I wonder if the OP is doing something like this:

test=> SELECT getdatabaseencoding();
 getdatabaseencoding
---------------------
 LATIN2
(1 row)

test=> SHOW client_encoding;
 client_encoding
-----------------
 win1250
(1 row)

test=> CREATE TABLE test (t text);
CREATE TABLE
test=> INSERT INTO test VALUES (E'\202\232'); -- \202=0x82, \232=0x9a
INSERT 0 1
test=> SELECT * FROM test;
ERROR:  character 0x829a of encoding "MULE_INTERNAL" has no equivalent in "WIN1250"

The intent might be that E'\202\232' is a string in the client's
encoding, where it would represent the same characters as Unicode
<U+201A SINGLE LOW-9 QUOTATION MARK, U+0161 LATIN SMALL LETTER S
WITH CARON> (I'm using Unicode as the pivot for convenience).  But
the backend is handling the string in the database's encoding, where
it represents <U+0082,U+009A>, which are control characters that
don't have mappings in win1250; hence the conversion error when the
client tries to read the data.

Just a guess.

--
Michael Fuhr

Re: MULE_INTERNAL translation to win1250

From

"NTPT"

Date:

28 January 2007, 23:06:10

I made a some future investigation. I find and identified an exact line in
databse. Exact column that cause a problem, I am able to select column  into
testtable while in "testtable"  it retain  its bad behavior. fortunally,
this row  does not contain vital
 data so I can drop it rather without a  bigger problem, but I would like to
know why....

 I am able to  identify a single character that cause a problem in real data
and in "testtable"  too.  (rather character combination using substring
function - it seems that in certain point it take two characters as  single
16bit one ) but I am not able to reproduce this behavior on fresh table
using "insert" and  "select" statements. Please give me a  some tip where to
search and what else informations to provide.

thank you.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "NTPT" <ntpt@centrum.cz>
Cc: <pgsql-general@postgresql.org>
Sent: Monday, January 29, 2007 12:33 AM
Subject: Re: [GENERAL] MULE_INTERNAL translation to win1250

> "NTPT" <ntpt@centrum.cz> writes:
>> Without "set client_encoding to win1250" query works. I am curious why
>> there
>> is a MULE_INTERNAL  mentioned even when \l+  say that corresponding
>> database
>> is created with  (and even all  the cluster)  LATIN2 encoding.
>
> The conversions between LATIN2 and WIN1250 go by way of MULE_INTERNAL to
> reduce duplication of code.  It shouldn't make any difference to the end
> result though.  Are you sure that the characters you're using are
> supposed to have representations in both character sets?
>
>> May be a bug in charset translation routines of postgres ?
>
> If you think that, you need to provide us with the exact codes that are
> being mistranslated and what you think they should translate to.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.410 / Virus Database: 268.17.12/654 - Release Date: 27.1.2007
>
>

Re: MULE_INTERNAL translation to win1250

From

Michael Fuhr

Date:

29 January 2007, 01:22:13

On Sun, Jan 28, 2007 at 07:27:12PM -0700, Michael Fuhr wrote:
> I wonder if the OP is doing something like this:
[...]
> test=> INSERT INTO test VALUES (E'\202\232'); -- \202=0x82, \232=0x9a

Another possibility, perhaps more likely, is that some connection
didn't set client_encoding to win1250 before it inserted win1250-encoded
data; in that case the data was probably treated as LATIN2 and
stored without conversion.  When a connection with client_encoding
set to win1250 tries to fetch the data, conversion is attempted and
fails because some LATIN2 values don't have win1250 mappings.

--
Michael Fuhr