Re: [Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]] - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]] |
Date | |
Msg-id | 200106012057.f51Kvjv01558@candle.pha.pa.us Whole thread Raw |
In response to | [Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]] (Barry Lind <barry@xythos.com>) |
List | pgsql-patches |
Patch applied. Thanks. > The following patch for JDBC fixes an issue with jdbc running on a > non-multibyte database loosing 8bit characters. This patch will cause > the jdbc driver to ignore the encoding reported by the database when > multibyte isn't enabled and use the JVM default in that case. > > thanks, > --Barry > > > -------- Original Message -------- > Subject: Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug > with pgsql 7.1/jdbc and non-ascii (8-bit) chars?) > Date: Fri, 25 May 2001 17:12:09 -0700 > From: Barry Lind > To: Tatsuo Ishii , tgl@sss.pgh.pa.us > References: <3AF74768.8060807@xythos.com> > <20010508110249R.t-ishii@sra.co.jp> <3AF78113.6080907@xythos.com> > <20010509102305C.t-ishii@sra.co.jp> > > > > Tatsuo, Tom, > > Since the two of you were the only two that seemed to care about this > thread, I am addressing you directly. I want to come to some sort of > resolution. Since it doesn't appear that anything is going to be > changed in the backend code inn 7.2 to address the issue here, I will > submit the attached patch to the jdbc code. > > This patch uses the function pg_encoding_to_char(1) to determine that > multibyte is not enabled on the server (as suggested by Tatsuo), and in > that case will use the default JVM character set to convert data from > the backend. This is instead of the current behaviour that will force > all data to 7bit ascii in the non-multibyte case because > getdatabaseencoding() always returns SQL_ASCII for non-multibyte databases. > > If I don't hear anything, I will go ahead and submit this patch. > > thanks for your help on this issue. > > --Barry > > > Tatsuo Ishii wrote: > > >>> Still I don't see what you are wanting in the JDBC driver if > >>> PostgreSQL would return "UNKNOWN" indicating that the backend is not > >>> compiled with MULTIBYTE. Do you want exact the same behavior as prior > >>> 7.1 driver? i.e. reading data from the PostgreSQL backend, assume its > >>> encoding default to the Java client (that is set by locale or > >>> something else) and convert it to UTF-8. If so, that would make sense > >>> to me... > >> > >> My suggestion would be that if the jdbc client was able to determine if > >> the server character set was UNKNOWN (i.e. no multibyte) that it would > >> then use some appropriate default character set to perform conversions > >> to UCS2 (LATIN1 would probably make the most sence as a default). The > >> jdbc driver would perform its existing behavior if the character set was > >> SQL_ASCII and multibyte was enabled (i.e. only support 7bit characters > >> just like the backend does). > >> > >> Note that the user is always able to override the character set used for > >> conversion by setting the charSet property. > > > > > > I see. However I would say we could not change the current behavior > > of the backend until 7.2 is out. It is our policy the we would not > > add/change existing functionalities while we are in the minor release > > cycle. > > > > What about doing like this: > > > > 1. call pg_encoding_to_char(1) (actually any number except 0 is ok) > > > > 2. if it returns "SQL_ASCII", then you could assume that MULTIBYTE is > > not enbaled. > > > > This is pretty ugly, but should work. > > > >> Tom also mentioned that it might be possible for the server to support > >> setting the character set for a database even when multibyte wasn't > >> enabled. That would then allow clients like jdbc to get a value from > >> non-multibyte enabled servers that would be more meaningful than the > >> current SQL_ASCII. If this where done, then the 'UNKNOWN' hack would > >> not be necessary. > > > > > > Tom's suggestion does not sound reasonable to me. If PostgreSQL is not > > built with MULTIBYTE, then it means there would be no such idea > > "encoding" in PostgreSQL becuase there is no program to handle > > encodings. Thus it would be meaningless to assign an "encoding" to a > > database if MULTIBYTE is not enabled. > > -- > > Tatsuo Ishii > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > > > > > > *** ./org/postgresql/Connection.java.orig Fri May 25 16:23:02 2001 > --- ./org/postgresql/Connection.java Fri May 25 16:26:55 2001 > *************** > *** 267,273 **** > // > firstWarning = null; > > ! java.sql.ResultSet initrset = ExecSQL("set datestyle to 'ISO'; select getdatabaseencoding()"); > > String dbEncoding = null; > //retrieve DB properties > --- 267,274 ---- > // > firstWarning = null; > > ! java.sql.ResultSet initrset = ExecSQL("set datestyle to 'ISO'; " + > ! "select case when pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else getdatabaseencoding() end"); > > String dbEncoding = null; > //retrieve DB properties > *************** > *** 319,324 **** > --- 320,330 ---- > > } else if (dbEncoding.equals("WIN")) { > dbEncoding = "Cp1252"; > + } else if (dbEncoding.equals("UNKNOWN")) { > + //This isn't a multibyte database so we don't have an encoding to use > + //We leave dbEncoding null which will cause the default encoding for the > + //JVM to be used > + dbEncoding = null; > } else { > dbEncoding = null; > } > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
pgsql-patches by date: