On 23.02.2011 17:16, Andrew Dunstan wrote:
> On 02/23/2011 10:09 AM, Peter Geoghegan wrote:
>> On 23 February 2011 04:36, Greg Stark<gsstark@mit.edu> wrote:
>>> This is only true for server encodings. In a client library I think
>>> you lose on this and do have to deal with it. I'm not sure what client
>>> encodings we do support that aren't ascii-supersets though, it's
>>> possible none of them generate quote characters this way.
>> I'm pretty sure all of the client encodings Tatsuo mentions are ASCII
>> supersets. The absence of by far the most popular non-ASCII superset
>> encoding, UTF-16, as a client encoding indicated that to me. It isn't
>> byte oriented, and Postgres is.
>
> They are not. It's precisely because they are not that they are not
> allowed as server encodings.
To be precise, they are all ASCII supersets in the sense that a valid
7-bit ASCII string is valid and means the same thing in all of the
client-only encodings as well. The difference between supported
server-encodings and those that are only supported as client_encoding is
whether *all* bytes in a multi-byte character have the high bit set. All
server-encodings have that property, and we rely on it in the backend.
In the supported client-only encodings, the *first* byte of a multi-byte
character is guaranteed to have the high bit set, but the subsequent
bytes are not.
Even that more loose property isn't true for UTF-16, which is why we
don't support it even as a client-only encoding.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com