Re: BUG #4890: Allow insert character has no equivalent in "LATIN2" - Mailing list pgsql-bugs

From Craig Ringer
Subject Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"
Date
Msg-id 1247507930.17862.111.camel@ayaki
Whole thread Raw
In response to BUG #4890: Allow insert character has no equivalent in "LATIN2"  ("saint" <saint@akpa.pl>)
Responses Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"
Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"
List pgsql-bugs
(Please reply to the list, not just to me)

I'm not sure about this so far. Re the specific issue you mention of
conversion between cp1250 and latin-2 (ISO-8859-2) the Unicode tables
at:

  http://unicode.org/Public/MAPPINGS/ISO8859/8859-2.TXT

appear to agree - there's no PER MILLE in ISO-8859-2.

With a UTF-8 database, Pg correctly doesn't accept PER MILLE as a valid
ISO-8859-2 char:

-- Connecting with unicode (utf-8) client
CREATE TABLE test (x);
INSERT INTO test(x) VALUES ('‰');

SET client_encoding='iso-8859-2';
SELECT * from test;
ERROR:  character 0xe280b0 of encoding "UTF8" has no equivalent in
"LATIN2"

If the encoding is set to WIN1250 Pg outputs the appropriate byte. So
it's doing the right thing in each individual case where a UTF-8 DB is
concerned.

Your problem, though, is that if you connect to a LATIN2 database with a
WIN1250 client and INSERT a string containing the per-mille glyph, Pg
accepts it and it should not. If it does, indeed, accept it, then I
agree that's a bug.

I haven't tested with a LATIN2 database as I'd have to re-initdb and the
machine I'm working on has semi-useful databases on it. What you're
saying makes sense, though, presuming your client really is sending
win1250 per-mille (byte 0x89).


I'd still like to know how you're setting your client encoding. You
can't just run "SET client_encoding='win1250'" - you must tell the
client program, or the terminal it runs in, to use the appropriate
encoding as well. Otherwise when you paste the per-mille character
you'll see the right glyph, but the CLIENT will interpret that as the
character in the encoding you specified.

So, if you're using a utf-8 terminal, that means that the terminal will
send 0xe2 0x80 0xb0 for per-mille, which when interpreted as win1250
becomes ‰ , so that's what the server thinks you sent it.

In that case, though, you'd find that the euro symbol, which isn't
defined in latin-2, will cause an error:

ERROR:  character 0xe282ac of encoding "UTF8" has no equivalent in
"LATIN2"




--
Craig Ringer

pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BUG #4914: uuid_generate_v4 not present in eithersource or yum/rpm
Next
From: Tom Lane
Date:
Subject: Re: BUG #4890: Allow insert character has no equivalent in "LATIN2"