Re: Why don't I get a LATIN1 encoding here with SET ENCODING? - Mailing list pgsql-sql

From Craig Ringer
Subject Re: Why don't I get a LATIN1 encoding here with SET ENCODING?
Date
Msg-id 4AF0F949.3060807@postnewspapers.com.au
Whole thread Raw
In response to Re: Why don't I get a LATIN1 encoding here with SET ENCODING?  (Bryce Nesbitt <bryce2@obviously.com>)
Responses Re: Why don't I get a LATIN1 encoding here with SET ENCODING?  (Bryce Nesbitt <bryce2@obviously.com>)
List pgsql-sql
Bryce Nesbitt wrote:
> 
> 
> Craig Ringer wrote:
>> In truth, that's how I'd expect it to happen. If I ask for the byte 0xfd
>> in a string, I don't want the server to decide that I must've meant
>> something else because I have a different client encoding. If I wanted
>> encoding conversion, I wouldn't have written it in an escape form, I'd
>> have written 'ý' not '\375'.

> I've got a client encoding of LATIN1... so I'd expect to be able to
> present any valid LATIN1 character, not care how the backend stored it,
> then get the same character back from the database.

Yes - but you are *not* presenting a Latin-1 character. You're
presenting four Latin-1 characters:
 '\', '3', '7', '5'

The server *cannot* process those as an escape sequence before first
converting the SQL string from client to server encoding. It doesn't
know what the bytes you sent it mean until it converts the data sent by
the client to the server encoding. Not all encodings preserve the lower
128 characters - in shift-jis, for example, the bytes usually used for
the '\' and '~' characters mean '¥' and '‾' respectively. If the server
didn't do client-to-server encoding before escape processing, a user
with a shift-jis client encoding who sent:
  test¥041

would be very surprised when the server saw that as:
  test!

instead of literally test¥041 like it should.


Perhaps when processing escapes after doing the encoding conversion the
server could apply any client->server encoding transformation on escape
sequences too. That would achieve the result you wanted here, but it
would leave you very, very, very confused and frustrated the first time
you tried to insert an image into a `bytea' field or manipulate a BLOB,
because the server would 'helpfully' translate the byte escapes for you.

To come closer to what you want, the server would have to detect whether
the escape was in a string that was going to land up in a
character-typed field instead of a byte-typed field. But what about
casts, functions, etc? And how would you specify it if you really did
want exactly those bytes in a text field? It'd be a nightmare.

The server does the only sensible, consistent thing - when you give it a
byte sequence, it assumes you mean literally those bytes.

--
Craig Ringer


pgsql-sql by date:

Previous
From: Bryce Nesbitt
Date:
Subject: Re: Why don't I get a LATIN1 encoding here with SET ENCODING?
Next
From: Bryce Nesbitt
Date:
Subject: Re: Why don't I get a LATIN1 encoding here with SET ENCODING?