[Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]] - Mailing list pgsql-patches
From | Barry Lind |
---|---|
Subject | [Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]] |
Date | |
Msg-id | 3B16D1DD.5020103@xythos.com Whole thread Raw |
Responses |
Re: [Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC]
Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]]
Re: [Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]] |
List | pgsql-patches |
The following patch for JDBC fixes an issue with jdbc running on a non-multibyte database loosing 8bit characters. This patch will cause the jdbc driver to ignore the encoding reported by the database when multibyte isn't enabled and use the JVM default in that case. thanks, --Barry -------- Original Message -------- Subject: Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?) Date: Fri, 25 May 2001 17:12:09 -0700 From: Barry Lind To: Tatsuo Ishii , tgl@sss.pgh.pa.us References: <3AF74768.8060807@xythos.com> <20010508110249R.t-ishii@sra.co.jp> <3AF78113.6080907@xythos.com> <20010509102305C.t-ishii@sra.co.jp> Tatsuo, Tom, Since the two of you were the only two that seemed to care about this thread, I am addressing you directly. I want to come to some sort of resolution. Since it doesn't appear that anything is going to be changed in the backend code inn 7.2 to address the issue here, I will submit the attached patch to the jdbc code. This patch uses the function pg_encoding_to_char(1) to determine that multibyte is not enabled on the server (as suggested by Tatsuo), and in that case will use the default JVM character set to convert data from the backend. This is instead of the current behaviour that will force all data to 7bit ascii in the non-multibyte case because getdatabaseencoding() always returns SQL_ASCII for non-multibyte databases. If I don't hear anything, I will go ahead and submit this patch. thanks for your help on this issue. --Barry Tatsuo Ishii wrote: >>> Still I don't see what you are wanting in the JDBC driver if >>> PostgreSQL would return "UNKNOWN" indicating that the backend is not >>> compiled with MULTIBYTE. Do you want exact the same behavior as prior >>> 7.1 driver? i.e. reading data from the PostgreSQL backend, assume its >>> encoding default to the Java client (that is set by locale or >>> something else) and convert it to UTF-8. If so, that would make sense >>> to me... >> >> My suggestion would be that if the jdbc client was able to determine if >> the server character set was UNKNOWN (i.e. no multibyte) that it would >> then use some appropriate default character set to perform conversions >> to UCS2 (LATIN1 would probably make the most sence as a default). The >> jdbc driver would perform its existing behavior if the character set was >> SQL_ASCII and multibyte was enabled (i.e. only support 7bit characters >> just like the backend does). >> >> Note that the user is always able to override the character set used for >> conversion by setting the charSet property. > > > I see. However I would say we could not change the current behavior > of the backend until 7.2 is out. It is our policy the we would not > add/change existing functionalities while we are in the minor release > cycle. > > What about doing like this: > > 1. call pg_encoding_to_char(1) (actually any number except 0 is ok) > > 2. if it returns "SQL_ASCII", then you could assume that MULTIBYTE is > not enbaled. > > This is pretty ugly, but should work. > >> Tom also mentioned that it might be possible for the server to support >> setting the character set for a database even when multibyte wasn't >> enabled. That would then allow clients like jdbc to get a value from >> non-multibyte enabled servers that would be more meaningful than the >> current SQL_ASCII. If this where done, then the 'UNKNOWN' hack would >> not be necessary. > > > Tom's suggestion does not sound reasonable to me. If PostgreSQL is not > built with MULTIBYTE, then it means there would be no such idea > "encoding" in PostgreSQL becuase there is no program to handle > encodings. Thus it would be meaningless to assign an "encoding" to a > database if MULTIBYTE is not enabled. > -- > Tatsuo Ishii > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > *** ./org/postgresql/Connection.java.orig Fri May 25 16:23:02 2001 --- ./org/postgresql/Connection.java Fri May 25 16:26:55 2001 *************** *** 267,273 **** // firstWarning = null; ! java.sql.ResultSet initrset = ExecSQL("set datestyle to 'ISO'; select getdatabaseencoding()"); String dbEncoding = null; //retrieve DB properties --- 267,274 ---- // firstWarning = null; ! java.sql.ResultSet initrset = ExecSQL("set datestyle to 'ISO'; " + ! "select case when pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else getdatabaseencoding() end"); String dbEncoding = null; //retrieve DB properties *************** *** 319,324 **** --- 320,330 ---- } else if (dbEncoding.equals("WIN")) { dbEncoding = "Cp1252"; + } else if (dbEncoding.equals("UNKNOWN")) { + //This isn't a multibyte database so we don't have an encoding to use + //We leave dbEncoding null which will cause the default encoding for the + //JVM to be used + dbEncoding = null; } else { dbEncoding = null; }
pgsql-patches by date: