Thread: MULE_INTERNAL translation to win1250
Hi. I have a strange problem in postgres 8.1.4 (gentoo 64bit on AMD64 platform) My database is created vith LATIN-2 encoding for correct vieving of nacional specific characters ( czech language ) inside code of my php application is setting client encoding to win1250 because I need output of query in this encoding. On some parts of data I got an error : Query failed: ERROR: character 0x829a of encoding "MULE_INTERNAL" has no equivalent in "WIN1250" Without "set client_encoding to win1250" query works. I am curious why there is a MULE_INTERNAL mentioned even when \l+ say that corresponding database is created with (and even all the cluster) LATIN2 encoding. Strange enough that ALL INSERTS are done with WIN1250 client encoding too. May be a bug in charset translation routines of postgres ? And how can I repair it, preferable in whole database ? Thanx for help.
"NTPT" <ntpt@centrum.cz> writes: > Without "set client_encoding to win1250" query works. I am curious why there > is a MULE_INTERNAL mentioned even when \l+ say that corresponding database > is created with (and even all the cluster) LATIN2 encoding. The conversions between LATIN2 and WIN1250 go by way of MULE_INTERNAL to reduce duplication of code. It shouldn't make any difference to the end result though. Are you sure that the characters you're using are supposed to have representations in both character sets? > May be a bug in charset translation routines of postgres ? If you think that, you need to provide us with the exact codes that are being mistranslated and what you think they should translate to. regards, tom lane
MULE_INTERNAL is used for an intermediate encoding between LATIN2 and WIN1250. The error message indicates that 0x9a of LATIN2 cannot be mapped to WIN1250. You can see 0x00 in the position for 0x9a (between 0x99 and 0x9b) in the encoding map in src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c, which indicates nothing is corresponding to LATIN2 0x9a. If you know what should be mapped for LATIN2 0x9a, please let know us. static const unsigned char iso88592_2_win1250[] = { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x00, 0x8B, 0x00, 0x00, 0x00, 0x00, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x00, 0x9B, 0x00, 0x00, 0x00, 0x00, 0xA0, 0xA5, 0xA2, 0xA3, 0xA4, 0xBC, 0x8C, 0xA7, 0xA8, 0x8A, 0xAA, 0x8D, 0x8F, 0xAD, 0x8E, 0xAF, 0xB0, 0xB9, 0xB2, 0xB3, 0xB4, 0xBE, 0x9C, 0xA1, 0xB8, 0x9A, 0xBA, 0x9D, 0x9F, 0xBD, 0x9E, 0xBF, 0xC0, 0xC1, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7, 0xC8, 0xC9, 0xCA, 0xCB, 0xCC, 0xCD, 0xCE, 0xCF, 0xD0, 0xD1, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6, 0xD7, 0xD8, 0xD9, 0xDA, 0xDB, 0xDC, 0xDD, 0xDE, 0xDF, 0xE0, 0xE1, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7, 0xE8, 0xE9, 0xEA, 0xEB, 0xEC, 0xED, 0xEE, 0xEF, 0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF }; -- Tatsuo Ishii SRA OSS, Inc. Japan > Hi. > > I have a strange problem in postgres 8.1.4 (gentoo 64bit on AMD64 > platform) > > My database is created vith LATIN-2 encoding for correct vieving of > nacional specific characters ( czech language ) > > inside code of my php application is setting client encoding to win1250 > because I need output of query in this encoding. > > On some parts of data I got an error : > > Query failed: ERROR: character 0x829a of encoding "MULE_INTERNAL" has no > equivalent in "WIN1250" > > Without "set client_encoding to win1250" query works. I am curious why there > is a MULE_INTERNAL mentioned even when \l+ say that corresponding database > is created with (and even all the cluster) LATIN2 encoding. > > Strange enough that ALL INSERTS are done with WIN1250 client encoding too. > May be a bug in charset translation routines of postgres ? > > > And how can I repair it, preferable in whole database ? > > > Thanx for help. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ >
On Sun, Jan 28, 2007 at 06:33:16PM -0500, Tom Lane wrote: > "NTPT" <ntpt@centrum.cz> writes: > > May be a bug in charset translation routines of postgres ? > > If you think that, you need to provide us with the exact codes that are > being mistranslated and what you think they should translate to. I wonder if the OP is doing something like this: test=> SELECT getdatabaseencoding(); getdatabaseencoding --------------------- LATIN2 (1 row) test=> SHOW client_encoding; client_encoding ----------------- win1250 (1 row) test=> CREATE TABLE test (t text); CREATE TABLE test=> INSERT INTO test VALUES (E'\202\232'); -- \202=0x82, \232=0x9a INSERT 0 1 test=> SELECT * FROM test; ERROR: character 0x829a of encoding "MULE_INTERNAL" has no equivalent in "WIN1250" The intent might be that E'\202\232' is a string in the client's encoding, where it would represent the same characters as Unicode <U+201A SINGLE LOW-9 QUOTATION MARK, U+0161 LATIN SMALL LETTER S WITH CARON> (I'm using Unicode as the pivot for convenience). But the backend is handling the string in the database's encoding, where it represents <U+0082,U+009A>, which are control characters that don't have mappings in win1250; hence the conversion error when the client tries to read the data. Just a guess. -- Michael Fuhr
I made a some future investigation. I find and identified an exact line in databse. Exact column that cause a problem, I am able to select column into testtable while in "testtable" it retain its bad behavior. fortunally, this row does not contain vital data so I can drop it rather without a bigger problem, but I would like to know why.... I am able to identify a single character that cause a problem in real data and in "testtable" too. (rather character combination using substring function - it seems that in certain point it take two characters as single 16bit one ) but I am not able to reproduce this behavior on fresh table using "insert" and "select" statements. Please give me a some tip where to search and what else informations to provide. thank you. ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "NTPT" <ntpt@centrum.cz> Cc: <pgsql-general@postgresql.org> Sent: Monday, January 29, 2007 12:33 AM Subject: Re: [GENERAL] MULE_INTERNAL translation to win1250 > "NTPT" <ntpt@centrum.cz> writes: >> Without "set client_encoding to win1250" query works. I am curious why >> there >> is a MULE_INTERNAL mentioned even when \l+ say that corresponding >> database >> is created with (and even all the cluster) LATIN2 encoding. > > The conversions between LATIN2 and WIN1250 go by way of MULE_INTERNAL to > reduce duplication of code. It shouldn't make any difference to the end > result though. Are you sure that the characters you're using are > supposed to have representations in both character sets? > >> May be a bug in charset translation routines of postgres ? > > If you think that, you need to provide us with the exact codes that are > being mistranslated and what you think they should translate to. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > > > > -- > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.1.410 / Virus Database: 268.17.12/654 - Release Date: 27.1.2007 > >
On Sun, Jan 28, 2007 at 07:27:12PM -0700, Michael Fuhr wrote: > I wonder if the OP is doing something like this: [...] > test=> INSERT INTO test VALUES (E'\202\232'); -- \202=0x82, \232=0x9a Another possibility, perhaps more likely, is that some connection didn't set client_encoding to win1250 before it inserted win1250-encoded data; in that case the data was probably treated as LATIN2 and stored without conversion. When a connection with client_encoding set to win1250 tries to fetch the data, conversion is attempted and fails because some LATIN2 values don't have win1250 mappings. -- Michael Fuhr