Re: SQL_ASCII vs. 7-bit ASCII encodings - Mailing list pgsql-hackers

From Oliver Jowett
Subject Re: SQL_ASCII vs. 7-bit ASCII encodings
Date
Msg-id 42848E0C.5010404@opencloud.com
Whole thread Raw
In response to Re: SQL_ASCII vs. 7-bit ASCII encodings  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
> 
>>Peter Eisentraut wrote:
>>
>>>That would cripple a system that many users are perfectly content with now.
> 
> 
>>Well, I wasn't thinking of using a 7-bit encoding always, just as a
>>replacement for the cases where we currently choose SQL_ASCII. Does that
>>sound reasonable?
> 
> 
> I agree with what (I think) Peter is saying: that would break things for
> many people for whom the default works fine now.
> 
> We are currently seeing a whole lot of complaints due to the fact that
> 8.0 tends to default to Unicode encoding in environments where previous
> versions defaulted to SQL-ASCII.  That says to me that a whole lot of
> people were getting along just fine in SQL-ASCII, and therefore that
> moving further away from that behavior is the wrong thing.  In
> particular, there is not any single one of those complainants who would
> be happier with a 7-bit-only default; if they were using 7-bit-only
> data, they'd not have noticed a problem anyway.

This is exactly the case where JDBC has problems, and the case I'd like
to prevent happening in the first place where possible: SQL_ASCII with
non-7-bit data. How do you propose that the JDBC driver converts from
SQL_ASCII to UTF-16 (the internal Java String representation)? Changing
client_encoding does not help. Requiring the JDBC client to specify the
right encoding to use is error-prone at best, and impossible at worst
(who says that only one encoding has been used?)

I'm not suggesting that a 7-bit encoding is necessarily useful to
everyone. I'm saying that we should make it a setting that users have to
think about and correctly set before they can insert 8-bit data. If they
decide they want SQL_ASCII and the associated client_encoding problems,
rather than an appropriate encoding the database understands, so be it;
but it's on their head, and requires active intervention before the
database starts losing encoding information.

If SQL_ASCII plus 8-bit data is considered the right thing to do, then
I'd consider the ability to change client_encoding on a SQL_ASCII
database without an error to be a bug -- you've asked the server to give
you (for example) UTF8, but it isn't doing that. In that case, can we
get this to generate an error when client_encoding is set instead of
producing invalid output?

-O


pgsql-hackers by date:

Previous
From: Andreas Pflug
Date:
Subject: Re: Server instrumentation for 8.1
Next
From: Bruno Wolff III
Date:
Subject: Re: Views, views, views: Summary of Arguments