Thread: Very strange Error in Updates
I found a very strange error/behavior in a PreparedStatement for a simple SQL Update over a VARCHAR field. The code is like: Statement stmt = con.createStatement(); PreparedStatement pstIns = con.prepareStatement("update userid.t set calle = ? "); ResultSet rs = stmt.executeQuery( "select calle from userid.t2" ); while ( rs.next() ) { pstIns.clearParameters(); String x = rs.getString("CALLE"); pstIns.setString(1, x ); int nrows = pstIns.executeUpdate(); System.out.println( "Filas afectadas "+ nrows ); } When the parameter fill the full-length of receiving field and has any non common character, the update throw a exception like: java.sql.SQLException: ERROR: value too long for type character varying(30) at org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420) at org.postgresql.core.v2.QueryExecutorImpl.processResults(QueryExecutorImpl.java:345) at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:251) at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:159) at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:342) at org.postgresql.jdbc1.AbstractJdbc1Statement.executeWithFlags(AbstractJdbc1Statement.java:290) at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(AbstractJdbc1Statement.java:245) at com.sistemat.app.rsmaster.reloadable.RepImportar.PrDw_Cuentas(RepImportar.java:2471) at com.sistemat.app.rsmaster.reloadable.RepImportar.Importar(RepImportar.java:260) at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.Ciclo_Replicacion(RepMasterImpl.java:955) at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.runWorker(RepMasterImpl.java:748) at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.run(RepMasterImpl.java:427) at java.lang.Thread.run(Unknown Source) This field/value produce the Exception: CALLE=[ENFERMERA CLEMON. B§ ALTO ALBR] If this field is changed to: CALLE=[ENFERMERA CLEMON. B# ALTO ALBR] then the update is performed without any problem. Although the first value can be Inserted with a prepared statement without problems. A have no explanation to this case , and any help will be appreciated. Dario Fassi.
Worst cases:
The receiving field is a Varchar(30) .
Sample data :
CALLE=[ENFERMERA CLEMON. B- ALTO ALBR] len=30 : Is Updated Ok.
CALLE=[ENFERMERA CLEMON. Bº ALTO ALBR] len=30 : Is NOT Updated
CALLE=[ENFERMERA CLEMON. Bº ALTO ALB] len=29 : Is Updated Ok.
Dario V. Fassi wrote:
Dario V. Fassi
SISTEMATICA ingenieria de software srl
Ituzaingo 1628 (2000) Rosario, Santa Fe, Argentina.
Tel / Fax: +54 (341) 485.1432 / 485.1353
The receiving field is a Varchar(30) .
Sample data :
CALLE=[ENFERMERA CLEMON. B- ALTO ALBR] len=30 : Is Updated Ok.
CALLE=[ENFERMERA CLEMON. Bº ALTO ALBR] len=30 : Is NOT Updated
CALLE=[ENFERMERA CLEMON. Bº ALTO ALB] len=29 : Is Updated Ok.
Dario V. Fassi wrote:
I found a very strange error/behavior in a PreparedStatement for a simple SQL Update over a VARCHAR field.
The code is like:
Statement stmt = con.createStatement();
PreparedStatement pstIns = con.prepareStatement("update userid.t set calle = ? "); ResultSet rs = stmt.executeQuery( "select calle from userid.t2" );
while ( rs.next() ) {
pstIns.clearParameters();
String x = rs.getString("CALLE");
pstIns.setString(1, x );
int nrows = pstIns.executeUpdate();
System.out.println( "Filas afectadas "+ nrows );
}
When the parameter fill the full-length of receiving field and has any non common character, the update throw a exception like:
java.sql.SQLException: ERROR: value too long for type character varying(30)
at org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420)
at org.postgresql.core.v2.QueryExecutorImpl.processResults(QueryExecutorImpl.java:345)
at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:251)
at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:159)
at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:342)
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeWithFlags(AbstractJdbc1Statement.java:290)
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(AbstractJdbc1Statement.java:245)
at com.sistemat.app.rsmaster.reloadable.RepImportar.PrDw_Cuentas(RepImportar.java:2471)
at com.sistemat.app.rsmaster.reloadable.RepImportar.Importar(RepImportar.java:260)
at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.Ciclo_Replicacion(RepMasterImpl.java:955)
at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.runWorker(RepMasterImpl.java:748)
at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.run(RepMasterImpl.java:427)
at java.lang.Thread.run(Unknown Source)
This field/value produce the Exception: CALLE=[ENFERMERA CLEMON. B§ ALTO ALBR]
If this field is changed to: CALLE=[ENFERMERA CLEMON. B# ALTO ALBR]
then the update is performed without
any problem.
Although the first value can be Inserted with a prepared statement without problems.
A have no explanation to this case , and any help will be appreciated.
Dario Fassi.
--
Dario V. Fassi
SISTEMATICA ingenieria de software srl
Ituzaingo 1628 (2000) Rosario, Santa Fe, Argentina.
Tel / Fax: +54 (341) 485.1432 / 485.1353
On Thu, 15 Jul 2004, Dario V. Fassi wrote: > When the parameter fill the full-length of receiving field and has any > non common character, the update throw a exception like: > > java.sql.SQLException: ERROR: value too long for type character varying(30) This is likely an encoding problem. What is the encoding of the database? I'm guessing it's SQL_ASCII and you when entering data from java which is in unicode it takes more than one byte of storage which is represented as more than one character in an ascii database so it goes over the limit. Kris Jurka
Dario V. Fassi wrote: > When the parameter fill the full-length of receiving field and has any > non common character, the update throw a exception like: > > java.sql.SQLException: ERROR: value too long for type character > varying(30) > at > org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420) What server version (I'm guessing <= 7.3 since the driver is using the V2 protocol) and database encoding are you using? -O
Oliver Jowett wrote: > Dario V. Fassi wrote: > >> When the parameter fill the full-length of receiving field and has >> any non common character, the update throw a exception like: >> >> java.sql.SQLException: ERROR: value too long for type character >> varying(30) >> at >> org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420) > > > > What server version (I'm guessing <= 7.3 since the driver is using the > V2 protocol) and database encoding are you using? > > -O Server 7.3.4 for W2K and Linux too. Encoding SQL_ASCII in both cases. Dario
Kris Jurka wrote:
On Thu, 15 Jul 2004, Dario V. Fassi wrote:When the parameter fill the full-length of receiving field and has any non common character, the update throw a exception like: java.sql.SQLException: ERROR: value too long for type character varying(30)This is likely an encoding problem. What is the encoding of the database? I'm guessing it's SQL_ASCII and you when entering data from java which is in unicode it takes more than one byte of storage which is represented as more than one character in an ascii database so it goes over the limit. Kris Jurka
Kris, the value of field is originate from a DB2 v6.1 with encoding IBM-850 (Ascii PC), I don't believe that the value are unicode.
But if it's unicode , how can I get sure of that, and What can be done to workaround that problem ?
Dario.
But if it's unicode , how can I get sure of that, and What can be done to workaround that problem ?
Dario.
On Thu, 15 Jul 2004, Dario V. Fassi wrote: > Kris Jurka wrote: > > Kris, the value of field is originate from a DB2 v6.1 with encoding > IBM-850 (Ascii PC), I don't believe that the value are unicode. I mean that java and the jdbc driver internally represent strings with unicode. If any of the data has the high bit set (ASCII values > 127) then the jdbc driver will send it as two bytes or more because it uses UTF-8. Normally the server will convert it from UTF-8 to the database's encoding, but if the database is SQL_ASCII it doesn't know how to convert it and must keep it as two bytes. You have not told us what your database's encoding is yet. Kris Jurka
Dario V. Fassi wrote: > > > Oliver Jowett wrote: > >> Dario V. Fassi wrote: >> >>> When the parameter fill the full-length of receiving field and has >>> any non common character, the update throw a exception like: >>> >>> java.sql.SQLException: ERROR: value too long for type character >>> varying(30) >>> at >>> org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420) >> >> >> >> >> >> What server version (I'm guessing <= 7.3 since the driver is using the >> V2 protocol) and database encoding are you using? >> >> -O > > > Server 7.3.4 for W2K and Linux too. > Encoding SQL_ASCII in both cases. You can only put 7-bit characters into a SQL_ASCII database. The JDBC driver always speaks UNICODE when it can, since that matches Java's internal string representation. I suspect that what's happening is: 0) the driver sets client_encoding = UNICODE during connection setup 1) the driver encodes the parameter as UNICODE (== UTF8); for characters above 127 this encoding will result in more than one byte per character. 2) the server converts from client_encoding UNICODE to database encoding SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does some arbitary conversion, probably just copying the illegal values unchanged. 3) you end up with extra characters in the resulting value which exceeds the varchar's size. The solution is to use a database encoding that matches your data. -O
Server 7.3.4 for W2K and Linux too.
Encoding SQL_ASCII in both cases.
I understand the source of the problem , but the ASCII encoding are not 7 bits , it has 8 bits with international charsets in codepages, like values in examples.
You are talking about US-ASCII charset , that is a Unicode subset of 7 bits.
No matter that , and speaking in CHARS , if I'm putting a 30 chars length string at a field of 30 chars length ,
I think that the driver can/must assure, a 30 chars length string transfer.
May be a "data truncation" warning can be acceptable, or a replacement byte/char, or cutting the eight bit ,
but it's no sufficient reason to abort the update.
What 's your opinion ?
Dario.
Kris Jurka wrote:
Dario V. Fassi
SISTEMATICA ingenieria de software srl
Ituzaingo 1628 (2000) Rosario, Santa Fe, Argentina.
Tel / Fax: +54 (341) 485.1432 / 485.1353
Encoding SQL_ASCII in both cases.
I understand the source of the problem , but the ASCII encoding are not 7 bits , it has 8 bits with international charsets in codepages, like values in examples.
You are talking about US-ASCII charset , that is a Unicode subset of 7 bits.
No matter that , and speaking in CHARS , if I'm putting a 30 chars length string at a field of 30 chars length ,
I think that the driver can/must assure, a 30 chars length string transfer.
May be a "data truncation" warning can be acceptable, or a replacement byte/char, or cutting the eight bit ,
but it's no sufficient reason to abort the update.
What 's your opinion ?
Dario.
Kris Jurka wrote:
On Thu, 15 Jul 2004, Dario V. Fassi wrote:Kris Jurka wrote: Kris, the value of field is originate from a DB2 v6.1 with encoding IBM-850 (Ascii PC), I don't believe that the value are unicode.I mean that java and the jdbc driver internally represent strings with unicode. If any of the data has the high bit set (ASCII values > 127) then the jdbc driver will send it as two bytes or more because it uses UTF-8. Normally the server will convert it from UTF-8 to the database's encoding, but if the database is SQL_ASCII it doesn't know how to convert it and must keep it as two bytes. You have not told us what your database's encoding is yet. Kris Jurka ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
Dario V. Fassi
SISTEMATICA ingenieria de software srl
Ituzaingo 1628 (2000) Rosario, Santa Fe, Argentina.
Tel / Fax: +54 (341) 485.1432 / 485.1353
Dario V. Fassi wrote: > Server 7.3.4 for W2K and Linux too. > Encoding SQL_ASCII in both cases. > > I understand the source of the problem , but the ASCII encoding are > not 7 bits , it has 8 bits with international charsets in codepages, > like values in examples. > You are talking about US-ASCII charset , that is a Unicode subset of 7 bits. You're arguing over nomenclature here. At the end of the day, a postgresql database encoding of SQL_ASCII means 7-bit ASCII; if you call that US-ASCII, fine, but it doesn't change the problem. With an encoding of SQL_ASCII the server does not have sufficient information to translate characters >127 between the database encoding and UNICODE, which is required by the JDBC driver (even if the JDBC driver did not set client_encoding to UNICODE, it'd still have to somehow do this translation itself since Java strings are represented as UTF-16). See http://www.postgresql.org/docs/current/static/multibyte.html for some more details. The JDBC driver will always use a "client character set" of UNICODE when talking to a >= 7.3 server. > No matter that , and speaking in CHARS , if I'm putting a 30 chars > length string at a field of 30 chars length , > I think that the driver can/must assure, a 30 chars length string transfer. > May be a "data truncation" warning can be acceptable, or a replacement > byte/char, or cutting the eight bit , > but it's no sufficient reason to abort the update. > > What 's your opinion ? The server already does a replacement -- the problem is that the replacement may be longer than one character (see the referenced docs above for handling of unrepresentable characters). So the server-side representation of a "30 character" Java string may actually be longer than 30 characters in the database encoding. Either way there's nothing the driver can really do about it -- we don't want to duplicate all the knowledge about charset conversions on the driver side (currently, the driver does know some details about encodings, but that's only there to support pre-7.3 servers). We just hand off a valid UNICODE string and let the server deal with it. If the server generates an error and aborts the transaction -- too bad, it's not the driver's fault. The best option is to fix your database encoding; UNICODE is your best bet if you're only talking to it via JDBC. If you really want silent truncation (bad idea!) you can get that via an explicit cast to varchar(30) in your query. -O
Oliver , you are right ! It's no semantic , the problem remain. But you can explain why the *exactly* sames values can be inserted and not updated ? And why via ODBC , the same statements with the same servers and with the same Dbs , run without problem ? Dario. Oliver Jowett wrote: > Dario V. Fassi wrote: > >> Server 7.3.4 for W2K and Linux too. >> Encoding SQL_ASCII in both cases. >> >> I understand the source of the problem , but the ASCII encoding are >> not 7 bits , it has 8 bits with international charsets in codepages, >> like values in examples. >> You are talking about US-ASCII charset , that is a Unicode subset of >> 7 bits. > > > You're arguing over nomenclature here. At the end of the day, a > postgresql database encoding of SQL_ASCII means 7-bit ASCII; if you > call that US-ASCII, fine, but it doesn't change the problem. With an > encoding of SQL_ASCII the server does not have sufficient information > to translate characters >127 between the database encoding and > UNICODE, which is required by the JDBC driver (even if the JDBC driver > did not set client_encoding to UNICODE, it'd still have to somehow do > this translation itself since Java strings are represented as UTF-16). > > See http://www.postgresql.org/docs/current/static/multibyte.html for > some more details. The JDBC driver will always use a "client character > set" of UNICODE when talking to a >= 7.3 server. > >> No matter that , and speaking in CHARS , if I'm putting a 30 chars >> length string at a field of 30 chars length , >> I think that the driver can/must assure, a 30 chars length string >> transfer. >> May be a "data truncation" warning can be acceptable, or a >> replacement byte/char, or cutting the eight bit , >> but it's no sufficient reason to abort the update. >> >> What 's your opinion ? > > > The server already does a replacement -- the problem is that the > replacement may be longer than one character (see the referenced docs > above for handling of unrepresentable characters). So the server-side > representation of a "30 character" Java string may actually be longer > than 30 characters in the database encoding. > > Either way there's nothing the driver can really do about it -- we > don't want to duplicate all the knowledge about charset conversions on > the driver side (currently, the driver does know some details about > encodings, but that's only there to support pre-7.3 servers). We just > hand off a valid UNICODE string and let the server deal with it. If > the server generates an error and aborts the transaction -- too bad, > it's not the driver's fault. > > The best option is to fix your database encoding; UNICODE is your best > bet if you're only talking to it via JDBC. If you really want silent > truncation (bad idea!) you can get that via an explicit cast to > varchar(30) in your query. > > -O > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > >
On Thu, 15 Jul 2004, Dario Fassi wrote: > But you can explain why the *exactly* sames values can be inserted and > not updated ? I don't belive this. > And why via ODBC , the same statements with the same servers and with > the same Dbs , run without problem ? The ODBC driver doesn't have any encoding knowledge and it just passes bytes around. As mentioned the JDBC does a transformation to UTF-8 which makes the incorrect database encoding apparent. Kris Jurka
On July 15, 2004 03:23 am, Dario Fassi wrote: > But you can explain why the *exactly* sames values can be inserted and > not updated ? When you re-select them, you'll probably see you have two garbage characters there instead of your single 'high' character. JdV!! ------------------------------------------------------------ Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
Oliver Jowett <oliver@opencloud.com> writes: > The JDBC driver always speaks UNICODE when it can, since that matches > Java's internal string representation. I suspect that what's happening is: > 0) the driver sets client_encoding = UNICODE during connection setup Right. > 1) the driver encodes the parameter as UNICODE (== UTF8); for characters > above 127 this encoding will result in more than one byte per character. Right. > 2) the server converts from client_encoding UNICODE to database encoding > SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does > some arbitary conversion, probably just copying the illegal values > unchanged. Not really. SQL_ASCII encoding basically means "we don't know what this data is, just store it verbatim". So the UTF-8 string sent by the driver is stored verbatim. > 3) you end up with extra characters in the resulting value which exceeds > the varchar's size. Right. Since the server does not know what encoding is in use, it falls back to the assumption that 1 character == 1 byte, under which assumption the string violates the varchar(30) constraint. Had the server known which encoding was in use, it would have counted the characters correctly. > The solution is to use a database encoding that matches your data. Actually, if you intend to access the database primarily through JDBC, it'd be best to use server encoding UNICODE. The JDBC driver will always want UNICODE on the wire, and I see no reason to force extra character set conversions. Non-UNICODE-aware clients can be handled by setting client_encoding properly. regards, tom lane
My problem it's that the data is just inside the postgresql server (with SQL_ASCII encoding), inserted by Win32/ODBC clients.
Now from JDBC I can't handle any row with any field that has one o more 8 bits characters.
At same time , Win32/ODBC programs continue to use it without any problem.
This situation let me in a hard to explain situation.
One more question, using the PreparedStatement.setBytes() , can be done the treatment that ODBC does with that fields ?
Thanks all for your help.
Dario.
Tom Lane wrote:
Oliver Jowett <oliver@opencloud.com> writes:The JDBC driver always speaks UNICODE when it can, since that matches Java's internal string representation. I suspect that what's happening is:0) the driver sets client_encoding = UNICODE during connection setupRight.1) the driver encodes the parameter as UNICODE (== UTF8); for characters above 127 this encoding will result in more than one byte per character.Right.2) the server converts from client_encoding UNICODE to database encoding SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does some arbitary conversion, probably just copying the illegal values unchanged.Not really. SQL_ASCII encoding basically means "we don't know what this data is, just store it verbatim". So the UTF-8 string sent by the driver is stored verbatim.3) you end up with extra characters in the resulting value which exceeds the varchar's size.Right. Since the server does not know what encoding is in use, it falls back to the assumption that 1 character == 1 byte, under which assumption the string violates the varchar(30) constraint. Had the server known which encoding was in use, it would have counted the characters correctly.The solution is to use a database encoding that matches your data.Actually, if you intend to access the database primarily through JDBC, it'd be best to use server encoding UNICODE. The JDBC driver will always want UNICODE on the wire, and I see no reason to force extra character set conversions. Non-UNICODE-aware clients can be handled by setting client_encoding properly. regards, tom lane
Dario V. Fassi wrote: > > My problem it's that the data is just inside the postgresql server (with > SQL_ASCII encoding), inserted by Win32/ODBC clients. > > Now from JDBC I can't handle any row with any field that has one o more > 8 bits characters. > At same time , Win32/ODBC programs continue to use it without any problem. > This situation let me in a hard to explain situation. The problem, as I understand it from Tom's explanation, is that SQL_ASCII only works if everyone is using the same client_encoding; the server has no knowledge of the real underlying encoding of the data so can't do conversions. JDBC always uses a client_encoding of UNICODE. I don't know what ODBC does, but apparently it's not using UNICODE. Perhaps one option is to set the database encoding to UNICODE, and either get the ODBC driver to issue an appropriate "SET client_encoding" on connection setup (I don't know if ODBC lets you do this) or set the default client_encoding in postgresql.conf to whatever is appropriate for ODBC clients? -O
Tom Lane wrote: >>2) the server converts from client_encoding UNICODE to database encoding >>SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does >>some arbitary conversion, probably just copying the illegal values >>unchanged. > > > Not really. SQL_ASCII encoding basically means "we don't know what this > data is, just store it verbatim". So the UTF-8 string sent by the > driver is stored verbatim. Hmm, so SQL_ASCII is not really a first-class encoding -- it doesn't do encoding conversions at all? It's going to break horribly in the face of clients using different client_encoding values, and somewhat less horribly even when everything uses a client_encoding of UNICODE (i.e. string lengths are wrong)? I wonder if the server behaviour could be somehow changed so that people don't shoot themselves in the foot so often (variants on this problem come up again and again..). The problem is that it works most of the time, only breaking on certain data, so it's not instantly apparent that you have a problem. What about refusing to change client_encoding to something other than SQL_ASCII on SQL_ASCII databases? (This would make the JDBC driver unusable against those database even for data that currently appears to work, though) Or perhaps the JDBC driver could issue a warning whenever it notices the underlying encoding is SQL_ASCII (this means another round-trip on connection setup even when using V3 though). Or refuse to even try to encode strings with characters >127 when the database encoding is SQL_ASCII. >>The solution is to use a database encoding that matches your data. > > Actually, if you intend to access the database primarily through JDBC, > it'd be best to use server encoding UNICODE. The JDBC driver will > always want UNICODE on the wire, and I see no reason to force extra > character set conversions. Non-UNICODE-aware clients can be handled by > setting client_encoding properly. Sure -- it just depends on what other clients use the db. By the sounds of it in this case the other client is an ODBC client that isn't aware of encodings at all.. I suppose this can be handled by the default client_encoding setting in postgresql.conf? -O
Oliver Jowett wrote: > Perhaps one option is to set the database encoding to UNICODE Or for that matter pretty much any encoding that handles your data and has a conversion to client_encoding = UNICODE (i.e. not SQL_ASCII) -O
Dario V. Fassi wrote: > One more question, using the PreparedStatement.setBytes() , can be done > the treatment that ODBC does with that fields ? I don't think setBytes() will work -- it deals with bytea fields which have their own text representation for binary data ('\nnn' escapes) -O
Oliver Jowett <oliver@opencloud.com> writes: > The problem, as I understand it from Tom's explanation, is that > SQL_ASCII only works if everyone is using the same client_encoding; the > server has no knowledge of the real underlying encoding of the data so > can't do conversions. Not only can the server not do conversions, but it cannot count string lengths "correctly" in strings that are really in a multibyte encoding. When JDBC sends a UTF8 string that contains some non-ASCII characters, the server can store the string safely, but it cannot operate on it in any intelligent way. I wonder whether the JDBC driver ought to warn about it if it sees server_encoding == SQL_ASCII? You're certainly just asking for trouble to use JDBC with such a setting. > JDBC always uses a client_encoding of UNICODE. I don't know what ODBC > does, but apparently it's not using UNICODE. ODBC is probably just passing through the client data as-is, and not doing anything at all with the encoding settings. > Perhaps one option is to set the database encoding to UNICODE, and > either get the ODBC driver to issue an appropriate "SET client_encoding" > on connection setup (I don't know if ODBC lets you do this) or set the > default client_encoding in postgresql.conf to whatever is appropriate > for ODBC clients? That would work. Plan B would be to set the database encoding to whatever the ODBC clients are using, and let encoding conversions happen when talking to a JDBC client. The one thing that is absolutely, positively guaranteed not to work is setting the DB encoding to SQL_ASCII. That defeats any chance you have of getting intelligent encoding behavior from the system. regards, tom lane
Oliver Jowett <oliver@opencloud.com> writes: > Tom Lane wrote: >> Not really. SQL_ASCII encoding basically means "we don't know what this >> data is, just store it verbatim". So the UTF-8 string sent by the >> driver is stored verbatim. > Hmm, so SQL_ASCII is not really a first-class encoding -- it doesn't do > encoding conversions at all? Correct. BTW, setting client_encoding to SQL_ASCII also disables on-the-wire encoding conversions (so that client data had better be in whatever the database encoding is). > What about refusing to change client_encoding to something other than > SQL_ASCII on SQL_ASCII databases? Not sure that would do anything very useful. People who aren't thinking about this probably aren't thinking about setting client_encoding properly, either. > Or perhaps the JDBC driver could issue a warning whenever it notices the > underlying encoding is SQL_ASCII (this means another round-trip on > connection setup even when using V3 though). Something like this seems reasonable. I'm not sure why we didn't make server_encoding be GUC_REPORT so that it would be sent automatically during connection startup ... we could change that in 7.5 if it would help any ... > Sure -- it just depends on what other clients use the db. By the sounds > of it in this case the other client is an ODBC client that isn't aware > of encodings at all.. I suppose this can be handled by the default > client_encoding setting in postgresql.conf? Yeah, as long as there is one specific encoding that all the encoding-ignorant clients are using. If there's more than one, perhaps you could get it to work by specifying per-user or per-database default client_encoding settings (see ALTER USER and ALTER DATABASE). regards, tom lane
Tom Lane wrote: > Oliver Jowett <oliver@opencloud.com> writes: > >>What about refusing to change client_encoding to something other than >>SQL_ASCII on SQL_ASCII databases? > > > Not sure that would do anything very useful. People who aren't thinking > about this probably aren't thinking about setting client_encoding > properly, either. I was thinking about it from the other angle -- clients that set client_encoding and expect the server to do the conversion (e.g. the JDBC driver) will see an error rather than bogus unconverted data. What does the server currently do if you ask for a client_encoding that isn't supported by the database encoding (e.g. LATIN1<->LATIN2)? It seems to me that SQL_ASCII is kinda-sorta-if-you-squint-a-bit like an encoding that doesn't support any client_encoding but SQL_ASCII. -O
We do a ODBC program to migrate the DB (SQL_ASCII) to a DB with UNICODE encoding .
This DB in ODBC with set CLIENT_ENCODING='UNICODE' , work fine.
1) Now from a JDBC java program , we read a row that has a field CALLE varchar(30) = 'ññññññññññññññññ' ,
2) then we do an Update of another field in the same row ,
3) then the untouched field ends CALLE varchar(30) = ''
I'm absolutly lost in this problem.
Dario.
Oliver Jowett wrote:
Dario V. Fassi
SISTEMATICA ingenieria de software srl
Ituzaingo 1628 (2000) Rosario, Santa Fe, Argentina.
Tel / Fax: +54 (341) 485.1432 / 485.1353
This DB in ODBC with set CLIENT_ENCODING='UNICODE' , work fine.
1) Now from a JDBC java program , we read a row that has a field CALLE varchar(30) = 'ññññññññññññññññ' ,
2) then we do an Update of another field in the same row ,
3) then the untouched field ends CALLE varchar(30) = ''
I'm absolutly lost in this problem.
Dario.
Oliver Jowett wrote:
Tom Lane wrote:Oliver Jowett <oliver@opencloud.com> writes:What about refusing to change client_encoding to something other than SQL_ASCII on SQL_ASCII databases?
Not sure that would do anything very useful. People who aren't thinking
about this probably aren't thinking about setting client_encoding
properly, either.
I was thinking about it from the other angle -- clients that set client_encoding and expect the server to do the conversion (e.g. the JDBC driver) will see an error rather than bogus unconverted data.
What does the server currently do if you ask for a client_encoding that isn't supported by the database encoding (e.g. LATIN1<->LATIN2)? It seems to me that SQL_ASCII is kinda-sorta-if-you-squint-a-bit like an encoding that doesn't support any client_encoding but SQL_ASCII.
-O
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
--
Dario V. Fassi
SISTEMATICA ingenieria de software srl
Ituzaingo 1628 (2000) Rosario, Santa Fe, Argentina.
Tel / Fax: +54 (341) 485.1432 / 485.1353
Now as suggested we are working with a UNICODE database.
With lastest dev Jdbc driver , don't work.
With and older mammoth Jdbc driver they work but the original problem return.
A java string of 30 chars can't be updated over a varchar(30) field, even with a UNICODE db.
Something in the middle has a wrong count of chars even if a database encoding is Unicode.
Server or Interface ?
Dario.
Dario V. Fassi wrote:
Worst cases:
The receiving field is a Varchar(30) .
Sample data :
CALLE=[ENFERMERA CLEMON. B- ALTO ALBR] len=30 : Is Updated Ok.
CALLE=[ENFERMERA CLEMON. Bº ALTO ALBR] len=30 : Is NOT Updated
CALLE=[ENFERMERA CLEMON. Bº ALTO ALB] len=29 : Is Updated Ok.
Dario V. Fassi wrote:
With lastest dev Jdbc driver , don't work.
With and older mammoth Jdbc driver they work but the original problem return.
A java string of 30 chars can't be updated over a varchar(30) field, even with a UNICODE db.
Something in the middle has a wrong count of chars even if a database encoding is Unicode.
Server or Interface ?
Dario.
Dario V. Fassi wrote:
Worst cases:
The receiving field is a Varchar(30) .
Sample data :
CALLE=[ENFERMERA CLEMON. B- ALTO ALBR] len=30 : Is Updated Ok.
CALLE=[ENFERMERA CLEMON. Bº ALTO ALBR] len=30 : Is NOT Updated
CALLE=[ENFERMERA CLEMON. Bº ALTO ALB] len=29 : Is Updated Ok.
Dario V. Fassi wrote:
I found a very strange error/behavior in a PreparedStatement for a simple SQL Update over a VARCHAR field.
The code is like:
Statement stmt = con.createStatement();
PreparedStatement pstIns = con.prepareStatement("update userid.t set calle = ? "); ResultSet rs = stmt.executeQuery( "select calle from userid.t2" );
while ( rs.next() ) {
pstIns.clearParameters();
String x = rs.getString("CALLE");
pstIns.setString(1, x );
int nrows = pstIns.executeUpdate();
System.out.println( "Filas afectadas "+ nrows );
}
When the parameter fill the full-length of receiving field and has any non common character, the update throw a exception like:
java.sql.SQLException: ERROR: value too long for type character varying(30)
at org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420)
at org.postgresql.core.v2.QueryExecutorImpl.processResults(QueryExecutorImpl.java:345)
at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:251)
at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:159)
at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:342)
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeWithFlags(AbstractJdbc1Statement.java:290)
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(AbstractJdbc1Statement.java:245)
at com.sistemat.app.rsmaster.reloadable.RepImportar.PrDw_Cuentas(RepImportar.java:2471)
at com.sistemat.app.rsmaster.reloadable.RepImportar.Importar(RepImportar.java:260)
at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.Ciclo_Replicacion(RepMasterImpl.java:955)
at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.runWorker(RepMasterImpl.java:748)
at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.run(RepMasterImpl.java:427)
at java.lang.Thread.run(Unknown Source)
This field/value produce the Exception: CALLE=[ENFERMERA CLEMON. B§ ALTO ALBR]
If this field is changed to: CALLE=[ENFERMERA CLEMON. B# ALTO ALBR]
then the update is performed without
any problem.
Although the first value can be Inserted with a prepared statement without problems.
A have no explanation to this case , and any help will be appreciated.
Dario Fassi.
After try *all* versions of jdbc drivers I found that the problems disapear using:
1) DB with UNICODE encoding.
2) Jdbc Driver : pg74.214.jdbc2.jar (186948 bytes )
Thanks Oliver, Tom and Kris for your time and help.
Dario V. Fassi wrote:
We do a ODBC program to migrate the DB (SQL_ASCII) to a DB with UNICODE encoding .
This DB in ODBC with set CLIENT_ENCODING='UNICODE' , work fine.
1) Now from a JDBC java program , we read a row that has a field CALLE varchar(30) = 'ññññññññññññññññ' ,
2) then we do an Update of another field in the same row ,
3) then the untouched field ends CALLE varchar(30) = ''
I'm absolutly lost in this problem.
Dario.
Oliver Jowett wrote:
1) DB with UNICODE encoding.
2) Jdbc Driver : pg74.214.jdbc2.jar (186948 bytes )
Thanks Oliver, Tom and Kris for your time and help.
Dario V. Fassi wrote:
We do a ODBC program to migrate the DB (SQL_ASCII) to a DB with UNICODE encoding .
This DB in ODBC with set CLIENT_ENCODING='UNICODE' , work fine.
1) Now from a JDBC java program , we read a row that has a field CALLE varchar(30) = 'ññññññññññññññññ' ,
2) then we do an Update of another field in the same row ,
3) then the untouched field ends CALLE varchar(30) = ''
I'm absolutly lost in this problem.
Dario.
Oliver Jowett wrote:
Tom Lane wrote:Oliver Jowett <oliver@opencloud.com> writes:What about refusing to change client_encoding to something other than SQL_ASCII on SQL_ASCII databases?
Not sure that would do anything very useful. People who aren't thinking
about this probably aren't thinking about setting client_encoding
properly, either.
I was thinking about it from the other angle -- clients that set client_encoding and expect the server to do the conversion (e.g. the JDBC driver) will see an error rather than bogus unconverted data.
What does the server currently do if you ask for a client_encoding that isn't supported by the database encoding (e.g. LATIN1<->LATIN2)? It seems to me that SQL_ASCII is kinda-sorta-if-you-squint-a-bit like an encoding that doesn't support any client_encoding but SQL_ASCII.
-O