Home > mailing lists

Re: BUG #4890: Allow insert character has no equivalent in "LATIN2" - Mailing list pgsql-bugs

From	Craig Ringer
Subject	Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"
Date	July 13, 2009 14:59:11
Msg-id	1247507930.17862.111.camel@ayaki Whole thread Raw
In response to	BUG #4890: Allow insert character has no equivalent in "LATIN2" ("saint" <saint@akpa.pl>)
Responses	Re: BUG #4890: Allow insert character has no equivalent in "LATIN2" Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"
List	pgsql-bugs

Tree view

(Please reply to the list, not just to me)

I'm not sure about this so far. Re the specific issue you mention of
conversion between cp1250 and latin-2 (ISO-8859-2) the Unicode tables
at:

  http://unicode.org/Public/MAPPINGS/ISO8859/8859-2.TXT

appear to agree - there's no PER MILLE in ISO-8859-2.

With a UTF-8 database, Pg correctly doesn't accept PER MILLE as a valid
ISO-8859-2 char:

-- Connecting with unicode (utf-8) client
CREATE TABLE test (x);
INSERT INTO test(x) VALUES ('â°');

SET client_encoding='iso-8859-2';
SELECT * from test;
ERROR:  character 0xe280b0 of encoding "UTF8" has no equivalent in
"LATIN2"

If the encoding is set to WIN1250 Pg outputs the appropriate byte. So
it's doing the right thing in each individual case where a UTF-8 DB is
concerned.

Your problem, though, is that if you connect to a LATIN2 database with a
WIN1250 client and INSERT a string containing the per-mille glyph, Pg
accepts it and it should not. If it does, indeed, accept it, then I
agree that's a bug.

I haven't tested with a LATIN2 database as I'd have to re-initdb and the
machine I'm working on has semi-useful databases on it. What you're
saying makes sense, though, presuming your client really is sending
win1250 per-mille (byte 0x89).


I'd still like to know how you're setting your client encoding. You
can't just run "SET client_encoding='win1250'" - you must tell the
client program, or the terminal it runs in, to use the appropriate
encoding as well. Otherwise when you paste the per-mille character
you'll see the right glyph, but the CLIENT will interpret that as the
character in the encoding you specified.

So, if you're using a utf-8 terminal, that means that the terminal will
send 0xe2 0x80 0xb0 for per-mille, which when interpreted as win1250
becomes Ã¢â¬Â° , so that's what the server thinks you sent it.

In that case, though, you'd find that the euro symbol, which isn't
defined in latin-2, will cause an error:

ERROR:  character 0xe282ac of encoding "UTF8" has no equivalent in
"LATIN2"




--
Craig Ringer

pgsql-bugs by date:

From: Alvaro Herrera
Date: 13 July 2009, 14:42:36
Subject: Re: BUG #4914: uuid_generate_v4 not present in eithersource or yum/rpm

From: Tom Lane
Date: 13 July 2009, 15:30:29
Subject: Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"

Re: BUG #4890: Allow insert character has no equivalent in "LATIN2" - Mailing list pgsql-bugs

Previous

Next