On Wed, 6 Apr 2005, Igor Postelnik wrote:
> I've asked this before on the performance list but didn't get any reply.
> Is there substantial performance difference between using SQL_ASCII,
> LATIN1, or UNICODE?
Performance where? Backend performance in terms of string comparisons and
sorting is driver off of locale, not encoding. You may use the C locale
with UNICODE encoding for example so that should not be an issue. For the
JDBC driver it always wants data coming back to it in unicode. If you've
got a unicode db no conversion is necessary. If you've got a sql_ascii
db no conversion is possible. If you've got a latin1 db conversion will
happen, but I don't know what the cost of that is.
> ISTM that when you create a database with SQL_ASCII encoding you decide
> to deal with character set issues in the applications. Why is the JDBC
> driver dictating how the application handles character set issues?
If the only API the JDBC driver provided was ResultSet.getBytes() then
that would be OK (note this is the only API libpq provides). To provide
getString() the driver must know what encoding the data coming back is
really in. A database encoding of sql_ascii tells us nothing so we can do
nothing about it. It has been suggested in the past to allow the real
database encoding for a sql_ascii database to be specified as a URL
parameter, but I am of the opinion that is just masking the problem, not
solving it. Data should be in a correctly encoded database. If you store
unicode data in a sql_ascii then things like varchar(N) are now the number
of bytes instead of the number of characters as it should. With sql_ascii
there is no restriction on what data can be entered and you can get
yourself in a real mess with different clients entering data in different
encodings. Do yourself a favor and pick a real encoding.
Kris Jurka