Thread: Very strange Error in Updates

Very strange Error in Updates

From
"Dario V. Fassi"
Date:
I found a very strange error/behavior in a PreparedStatement  for a
simple SQL Update over a VARCHAR field.
The code is like:

    Statement stmt = con.createStatement();
    PreparedStatement pstIns = con.prepareStatement("update userid.t set
calle = ? ");
    ResultSet rs = stmt.executeQuery( "select calle from userid.t2" );
    while ( rs.next() ) {
                pstIns.clearParameters();
                String x = rs.getString("CALLE");
                pstIns.setString(1, x );
                int nrows = pstIns.executeUpdate();
                System.out.println( "Filas afectadas "+ nrows );
     }


When the parameter fill the full-length of  receiving field and has any
non common character, the update throw a exception like:

java.sql.SQLException: ERROR:  value too long for type character varying(30)
    at
org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420)
    at
org.postgresql.core.v2.QueryExecutorImpl.processResults(QueryExecutorImpl.java:345)
    at
org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:251)
    at
org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:159)
    at
org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:342)
    at
org.postgresql.jdbc1.AbstractJdbc1Statement.executeWithFlags(AbstractJdbc1Statement.java:290)
    at
org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(AbstractJdbc1Statement.java:245)
    at
com.sistemat.app.rsmaster.reloadable.RepImportar.PrDw_Cuentas(RepImportar.java:2471)
    at
com.sistemat.app.rsmaster.reloadable.RepImportar.Importar(RepImportar.java:260)
    at
com.sistemat.app.rsmaster.reloadable.RepMasterImpl.Ciclo_Replicacion(RepMasterImpl.java:955)
    at
com.sistemat.app.rsmaster.reloadable.RepMasterImpl.runWorker(RepMasterImpl.java:748)
    at
com.sistemat.app.rsmaster.reloadable.RepMasterImpl.run(RepMasterImpl.java:427)
    at java.lang.Thread.run(Unknown Source)

This field/value produce the Exception:   CALLE=[ENFERMERA CLEMON. B§
ALTO ALBR]

If this field is changed to:                        CALLE=[ENFERMERA
CLEMON. B# ALTO ALBR]
then  the update is performed without
any problem.

Although the first value can be Inserted  with a prepared statement
without problems.

A have no explanation to this case , and any help will be appreciated.

Dario Fassi.


Re: Very strange Error in Updates

From
"Dario V. Fassi"
Date:
Worst cases:

The receiving field is a Varchar(30) .
Sample data :

CALLE=[ENFERMERA CLEMON. B- ALTO ALBR]  len=30    : Is Updated Ok.
CALLE=[ENFERMERA CLEMON. Bº ALTO ALBR]  len=30    : Is NOT Updated
CALLE=[ENFERMERA CLEMON. Bº ALTO ALB]  len=29    : Is Updated Ok.

Dario V. Fassi wrote:
I found a very strange error/behavior in a PreparedStatement  for a simple SQL Update over a VARCHAR field.
The code is like:

   Statement stmt = con.createStatement();
   PreparedStatement pstIns = con.prepareStatement("update userid.t set calle = ? ");         ResultSet rs = stmt.executeQuery( "select calle from userid.t2" );
   while ( rs.next() ) {
               pstIns.clearParameters();
               String x = rs.getString("CALLE");
               pstIns.setString(1, x );
               int nrows = pstIns.executeUpdate();
               System.out.println( "Filas afectadas "+ nrows );
    }
         
When the parameter fill the full-length of  receiving field and has any non common character, the update throw a exception like:

java.sql.SQLException: ERROR:  value too long for type character varying(30)
   at org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420)
   at org.postgresql.core.v2.QueryExecutorImpl.processResults(QueryExecutorImpl.java:345)
   at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:251)
   at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:159)
   at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:342)
   at org.postgresql.jdbc1.AbstractJdbc1Statement.executeWithFlags(AbstractJdbc1Statement.java:290)
   at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(AbstractJdbc1Statement.java:245)
   at com.sistemat.app.rsmaster.reloadable.RepImportar.PrDw_Cuentas(RepImportar.java:2471)
   at com.sistemat.app.rsmaster.reloadable.RepImportar.Importar(RepImportar.java:260)
   at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.Ciclo_Replicacion(RepMasterImpl.java:955)
   at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.runWorker(RepMasterImpl.java:748)
   at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.run(RepMasterImpl.java:427)
   at java.lang.Thread.run(Unknown Source)

This field/value produce the Exception:   CALLE=[ENFERMERA CLEMON. B§ ALTO ALBR]

If this field is changed to:                        CALLE=[ENFERMERA CLEMON. B# ALTO ALBR]
then  the update is performed without
any problem.

Although the first value can be Inserted  with a prepared statement  without problems.

A have no explanation to this case , and any help will be appreciated.

Dario Fassi.


--

    Dario V. Fassi


SISTEMATICA ingenieria de software  srl
Ituzaingo 1628  (2000)  Rosario, Santa Fe, Argentina.
Tel / Fax:  +54 (341) 485.1432 / 485.1353



Re: Very strange Error in Updates

From
Kris Jurka
Date:

On Thu, 15 Jul 2004, Dario V. Fassi wrote:

> When the parameter fill the full-length of  receiving field and has any
> non common character, the update throw a exception like:
>
> java.sql.SQLException: ERROR:  value too long for type character varying(30)

This is likely an encoding problem.  What is the encoding of the database?
I'm guessing it's SQL_ASCII and you when entering data from java which is
in unicode it takes more than one byte of storage which is represented as
more than one character in an ascii database so it goes over the limit.

Kris Jurka

Re: Very strange Error in Updates

From
Oliver Jowett
Date:
Dario V. Fassi wrote:

> When the parameter fill the full-length of  receiving field and has any
> non common character, the update throw a exception like:
>
> java.sql.SQLException: ERROR:  value too long for type character
> varying(30)
>    at
> org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420)

What server version (I'm guessing <= 7.3 since the driver is using the
V2 protocol) and database encoding are you using?

-O

Re: Very strange Error in Updates

From
"Dario V. Fassi"
Date:

Oliver Jowett wrote:

> Dario V. Fassi wrote:
>
>> When the parameter fill the full-length of  receiving field and has
>> any non common character, the update throw a exception like:
>>
>> java.sql.SQLException: ERROR:  value too long for type character
>> varying(30)
>>    at
>> org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420)
>
>
>
> What server version (I'm guessing <= 7.3 since the driver is using the
> V2 protocol) and database encoding are you using?
>
> -O

Server 7.3.4  for W2K and Linux too.
Encoding SQL_ASCII  in both cases.

Dario


Re: Very strange Error in Updates

From
"Dario V. Fassi"
Date:
Kris Jurka wrote:
On Thu, 15 Jul 2004, Dario V. Fassi wrote: 
When the parameter fill the full-length of  receiving field and has any 
non common character, the update throw a exception like:

java.sql.SQLException: ERROR:  value too long for type character varying(30)   
This is likely an encoding problem.  What is the encoding of the database?  
I'm guessing it's SQL_ASCII and you when entering data from java which is 
in unicode it takes more than one byte of storage which is represented as 
more than one character in an ascii database so it goes over the limit.

Kris Jurka 
Kris, the value of field is originate from a DB2 v6.1 with encoding IBM-850 (Ascii PC), I don't believe that the value are unicode.

But if it's unicode , how can I get sure of that, and What can be done to workaround that problem ?

Dario.

Re: Very strange Error in Updates

From
Kris Jurka
Date:

On Thu, 15 Jul 2004, Dario V. Fassi wrote:

> Kris Jurka wrote:
>
> Kris, the value of field is originate from a DB2 v6.1 with encoding
> IBM-850 (Ascii PC), I don't believe that the value are unicode.

I mean that java and the jdbc driver internally represent strings with
unicode.  If any of the data has the high bit set (ASCII values > 127)
then the jdbc driver will send it as two bytes or more because it uses
UTF-8.  Normally the server will convert it from UTF-8 to the database's
encoding, but if the database is SQL_ASCII it doesn't know how to convert
it and must keep it as two bytes.  You have not told us what your
database's encoding is yet.

Kris Jurka


Re: Very strange Error in Updates

From
Oliver Jowett
Date:
Dario V. Fassi wrote:
>
>
> Oliver Jowett wrote:
>
>> Dario V. Fassi wrote:
>>
>>> When the parameter fill the full-length of  receiving field and has
>>> any non common character, the update throw a exception like:
>>>
>>> java.sql.SQLException: ERROR:  value too long for type character
>>> varying(30)
>>>    at
>>> org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420)
>>
>>
>>
>>
>>
>> What server version (I'm guessing <= 7.3 since the driver is using the
>> V2 protocol) and database encoding are you using?
>>
>> -O
>
>
> Server 7.3.4  for W2K and Linux too.
> Encoding SQL_ASCII  in both cases.

You can only put 7-bit characters into a SQL_ASCII database.

The JDBC driver always speaks UNICODE when it can, since that matches
Java's internal string representation. I suspect that what's happening is:

0) the driver sets client_encoding = UNICODE during connection setup
1) the driver encodes the parameter as UNICODE (== UTF8); for characters
above 127 this encoding will result in more than one byte per character.
2) the server converts from client_encoding UNICODE to database encoding
SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does
some arbitary conversion, probably just copying the illegal values
unchanged.
3) you end up with extra characters in the resulting value which exceeds
the varchar's size.

The solution is to use a database encoding that matches your data.

-O

Re: Very strange Error in Updates

From
"Dario V. Fassi"
Date:
Server 7.3.4  for W2K and Linux too.
Encoding SQL_ASCII  in both cases.

I understand the source of the problem  , but the ASCII encoding  are not 7 bits , it has 8 bits with international charsets in codepages, like values in examples.
You are talking about US-ASCII charset , that is a Unicode subset of 7 bits.

No matter that , and speaking in CHARS , if  I'm putting a 30 chars length string at a field of 30 chars length ,
I think that the driver can/must assure, a 30 chars length string transfer.
May be a "data truncation" warning can be acceptable, or a replacement byte/char, or cutting the eight bit ,
but it's no sufficient reason to abort the update.

What 's your opinion ?

Dario.

Kris Jurka wrote:
On Thu, 15 Jul 2004, Dario V. Fassi wrote:
 
Kris Jurka wrote:

Kris, the value of field is originate from a DB2 v6.1 with encoding 
IBM-850 (Ascii PC), I don't believe that the value are unicode.   
I mean that java and the jdbc driver internally represent strings with
unicode.  If any of the data has the high bit set (ASCII values > 127)  
then the jdbc driver will send it as two bytes or more because it uses
UTF-8.  Normally the server will convert it from UTF-8 to the database's
encoding, but if the database is SQL_ASCII it doesn't know how to convert
it and must keep it as two bytes.  You have not told us what your 
database's encoding is yet.

Kris Jurka


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command   (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

 

--

    Dario V. Fassi


SISTEMATICA ingenieria de software  srl
Ituzaingo 1628  (2000)  Rosario, Santa Fe, Argentina.
Tel / Fax:  +54 (341) 485.1432 / 485.1353



Re: Very strange Error in Updates

From
Oliver Jowett
Date:
Dario V. Fassi wrote:
> Server 7.3.4  for W2K and Linux too.
> Encoding SQL_ASCII  in both cases.
>
> I understand the source of the problem  , but the ASCII encoding  are
> not 7 bits , it has 8 bits with international charsets in codepages,
> like values in examples.
> You are talking about US-ASCII charset , that is a Unicode subset of 7 bits.

You're arguing over nomenclature here. At the end of the day, a
postgresql database encoding of SQL_ASCII means 7-bit ASCII; if you call
that US-ASCII, fine, but it doesn't change the problem. With an encoding
of SQL_ASCII the server does not have sufficient information to
translate characters >127 between the database encoding and UNICODE,
which is required by the JDBC driver (even if the JDBC driver did not
set client_encoding to UNICODE, it'd still have to somehow do this
translation itself since Java strings are represented as UTF-16).

See http://www.postgresql.org/docs/current/static/multibyte.html for
some more details. The JDBC driver will always use a "client character
set" of UNICODE when talking to a >= 7.3 server.

> No matter that , and speaking in CHARS , if  I'm putting a 30 chars
> length string at a field of 30 chars length ,
> I think that the driver can/must assure, a 30 chars length string transfer.
> May be a "data truncation" warning can be acceptable, or a replacement
> byte/char, or cutting the eight bit ,
> but it's no sufficient reason to abort the update.
>
> What 's your opinion ?

The server already does a replacement -- the problem is that the
replacement may be longer than one character (see the referenced docs
above for handling of unrepresentable characters). So the server-side
representation of a "30 character" Java string may actually be longer
than 30 characters in the database encoding.

Either way there's nothing the driver can really do about it -- we don't
want to duplicate all the knowledge about charset conversions on the
driver side (currently, the driver does know some details about
encodings, but that's only there to support pre-7.3 servers). We just
hand off a valid UNICODE string and let the server deal with it. If the
server generates an error and aborts the transaction -- too bad, it's
not the driver's fault.

The best option is to fix your database encoding; UNICODE is your best
bet if you're only talking to it via JDBC. If you really want silent
truncation (bad idea!) you can get that via an explicit cast to
varchar(30) in your query.

-O

Re: Very strange Error in Updates

From
Dario Fassi
Date:
Oliver , you are right !  It's no semantic , the problem remain.

But you can explain why the *exactly* sames values can be inserted  and
not updated ?

And why via ODBC ,  the same statements with the same servers and with
the same Dbs , run without problem ?

Dario.

Oliver Jowett wrote:

> Dario V. Fassi wrote:
>
>> Server 7.3.4  for W2K and Linux too.
>> Encoding SQL_ASCII  in both cases.
>>
>> I understand the source of the problem  , but the ASCII encoding  are
>> not 7 bits , it has 8 bits with international charsets in codepages,
>> like values in examples.
>> You are talking about US-ASCII charset , that is a Unicode subset of
>> 7 bits.
>
>
> You're arguing over nomenclature here. At the end of the day, a
> postgresql database encoding of SQL_ASCII means 7-bit ASCII; if you
> call that US-ASCII, fine, but it doesn't change the problem. With an
> encoding of SQL_ASCII the server does not have sufficient information
> to translate characters >127 between the database encoding and
> UNICODE, which is required by the JDBC driver (even if the JDBC driver
> did not set client_encoding to UNICODE, it'd still have to somehow do
> this translation itself since Java strings are represented as UTF-16).
>
> See http://www.postgresql.org/docs/current/static/multibyte.html for
> some more details. The JDBC driver will always use a "client character
> set" of UNICODE when talking to a >= 7.3 server.
>
>> No matter that , and speaking in CHARS , if  I'm putting a 30 chars
>> length string at a field of 30 chars length ,
>> I think that the driver can/must assure, a 30 chars length string
>> transfer.
>> May be a "data truncation" warning can be acceptable, or a
>> replacement byte/char, or cutting the eight bit ,
>> but it's no sufficient reason to abort the update.
>>
>> What 's your opinion ?
>
>
> The server already does a replacement -- the problem is that the
> replacement may be longer than one character (see the referenced docs
> above for handling of unrepresentable characters). So the server-side
> representation of a "30 character" Java string may actually be longer
> than 30 characters in the database encoding.
>
> Either way there's nothing the driver can really do about it -- we
> don't want to duplicate all the knowledge about charset conversions on
> the driver side (currently, the driver does know some details about
> encodings, but that's only there to support pre-7.3 servers). We just
> hand off a valid UNICODE string and let the server deal with it. If
> the server generates an error and aborts the transaction -- too bad,
> it's not the driver's fault.
>
> The best option is to fix your database encoding; UNICODE is your best
> bet if you're only talking to it via JDBC. If you really want silent
> truncation (bad idea!) you can get that via an explicit cast to
> varchar(30) in your query.
>
> -O
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>


Re: Very strange Error in Updates

From
Kris Jurka
Date:

On Thu, 15 Jul 2004, Dario Fassi wrote:

> But you can explain why the *exactly* sames values can be inserted  and
> not updated ?

I don't belive this.

> And why via ODBC ,  the same statements with the same servers and with
> the same Dbs , run without problem ?

The ODBC driver doesn't have any encoding knowledge and it just passes
bytes around.  As mentioned the JDBC does a transformation to UTF-8 which
makes the incorrect database encoding apparent.

Kris Jurka

Re: Very strange Error in Updates

From
Jan de Visser
Date:
On July 15, 2004 03:23 am, Dario Fassi wrote:
> But you can explain why the *exactly* sames values can be inserted  and
> not updated ?

When you re-select them, you'll probably see you have two garbage characters
there instead of your single 'high' character.

JdV!!


------------------------------------------------------------
Jan de Visser                     jdevisser@digitalfairway.com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------

Re: Very strange Error in Updates

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> The JDBC driver always speaks UNICODE when it can, since that matches
> Java's internal string representation. I suspect that what's happening is:

> 0) the driver sets client_encoding = UNICODE during connection setup

Right.

> 1) the driver encodes the parameter as UNICODE (== UTF8); for characters
> above 127 this encoding will result in more than one byte per character.

Right.

> 2) the server converts from client_encoding UNICODE to database encoding
> SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does
> some arbitary conversion, probably just copying the illegal values
> unchanged.

Not really.  SQL_ASCII encoding basically means "we don't know what this
data is, just store it verbatim".  So the UTF-8 string sent by the
driver is stored verbatim.

> 3) you end up with extra characters in the resulting value which exceeds
> the varchar's size.

Right.  Since the server does not know what encoding is in use, it falls
back to the assumption that 1 character == 1 byte, under which
assumption the string violates the varchar(30) constraint.

Had the server known which encoding was in use, it would have counted
the characters correctly.

> The solution is to use a database encoding that matches your data.

Actually, if you intend to access the database primarily through JDBC,
it'd be best to use server encoding UNICODE.  The JDBC driver will
always want UNICODE on the wire, and I see no reason to force extra
character set conversions.  Non-UNICODE-aware clients can be handled by
setting client_encoding properly.

            regards, tom lane

Re: Very strange Error in Updates

From
"Dario V. Fassi"
Date:

My problem it's that the data is just inside the postgresql server (with SQL_ASCII encoding), inserted by Win32/ODBC clients.

Now from JDBC I can't handle any row with any field that has one o more 8 bits characters.
At same time , Win32/ODBC programs continue to use it without any problem.
This situation let me in a hard to explain situation.

One more question, using the PreparedStatement.setBytes() , can be done the treatment that ODBC does with that fields ?
Thanks all for your help.
Dario.

Tom Lane wrote:
Oliver Jowett <oliver@opencloud.com> writes: 
The JDBC driver always speaks UNICODE when it can, since that matches 
Java's internal string representation. I suspect that what's happening is:   
0) the driver sets client_encoding = UNICODE during connection setup   
Right. 
1) the driver encodes the parameter as UNICODE (== UTF8); for characters 
above 127 this encoding will result in more than one byte per character.   
Right. 
2) the server converts from client_encoding UNICODE to database encoding 
SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does 
some arbitary conversion, probably just copying the illegal values 
unchanged.   
Not really.  SQL_ASCII encoding basically means "we don't know what this
data is, just store it verbatim".  So the UTF-8 string sent by the
driver is stored verbatim. 
3) you end up with extra characters in the resulting value which exceeds 
the varchar's size.   
Right.  Since the server does not know what encoding is in use, it falls
back to the assumption that 1 character == 1 byte, under which
assumption the string violates the varchar(30) constraint.

Had the server known which encoding was in use, it would have counted
the characters correctly. 
The solution is to use a database encoding that matches your data.   
Actually, if you intend to access the database primarily through JDBC,
it'd be best to use server encoding UNICODE.  The JDBC driver will
always want UNICODE on the wire, and I see no reason to force extra
character set conversions.  Non-UNICODE-aware clients can be handled by
setting client_encoding properly.
		regards, tom lane 

Re: Very strange Error in Updates

From
Oliver Jowett
Date:
Dario V. Fassi wrote:
>
> My problem it's that the data is just inside the postgresql server (with
> SQL_ASCII encoding), inserted by Win32/ODBC clients.
>
> Now from JDBC I can't handle any row with any field that has one o more
> 8 bits characters.
> At same time , Win32/ODBC programs continue to use it without any problem.
> This situation let me in a hard to explain situation.

The problem, as I understand it from Tom's explanation, is that
SQL_ASCII only works if everyone is using the same client_encoding; the
server has no knowledge of the real underlying encoding of the data so
can't do conversions.

JDBC always uses a client_encoding of UNICODE. I don't know what ODBC
does, but apparently it's not using UNICODE.

Perhaps one option is to set the database encoding to UNICODE, and
either get the ODBC driver to issue an appropriate "SET client_encoding"
on connection setup (I don't know if ODBC lets you do this) or set the
default client_encoding in postgresql.conf to whatever is appropriate
for ODBC clients?

-O

Re: Very strange Error in Updates

From
Oliver Jowett
Date:
Tom Lane wrote:

>>2) the server converts from client_encoding UNICODE to database encoding
>>SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does
>>some arbitary conversion, probably just copying the illegal values
>>unchanged.
>
>
> Not really.  SQL_ASCII encoding basically means "we don't know what this
> data is, just store it verbatim".  So the UTF-8 string sent by the
> driver is stored verbatim.

Hmm, so SQL_ASCII is not really a first-class encoding -- it doesn't do
encoding conversions at all? It's going to break horribly in the face of
clients using different client_encoding values, and somewhat less
horribly even when everything uses a client_encoding of UNICODE (i.e.
string lengths are wrong)?

I wonder if the server behaviour could be somehow changed so that people
don't shoot themselves in the foot so often (variants on this problem
come up again and again..). The problem is that it works most of the
time, only breaking on certain data, so it's not instantly apparent that
you have a problem.

What about refusing to change client_encoding to something other than
SQL_ASCII on SQL_ASCII databases? (This would make the JDBC driver
unusable against those database even for data that currently appears to
work, though)

Or perhaps the JDBC driver could issue a warning whenever it notices the
underlying encoding is SQL_ASCII (this means another round-trip on
connection setup even when using V3 though). Or refuse to even try to
encode strings with characters >127 when the database encoding is SQL_ASCII.

>>The solution is to use a database encoding that matches your data.
>
> Actually, if you intend to access the database primarily through JDBC,
> it'd be best to use server encoding UNICODE.  The JDBC driver will
> always want UNICODE on the wire, and I see no reason to force extra
> character set conversions.  Non-UNICODE-aware clients can be handled by
> setting client_encoding properly.

Sure -- it just depends on what other clients use the db. By the sounds
of it in this case the other client is an ODBC client that isn't aware
of encodings at all.. I suppose this can be handled by the default
client_encoding setting in postgresql.conf?

-O

Re: Very strange Error in Updates

From
Oliver Jowett
Date:
Oliver Jowett wrote:

> Perhaps one option is to set the database encoding to UNICODE

Or for that matter pretty much any encoding that handles your data and
has a conversion to client_encoding = UNICODE (i.e. not SQL_ASCII)

-O

Re: Very strange Error in Updates

From
Oliver Jowett
Date:
Dario V. Fassi wrote:

> One more question, using the PreparedStatement.setBytes() , can be done
> the treatment that ODBC does with that fields ?

I don't think setBytes() will work -- it deals with bytea fields which
have their own text representation for binary data ('\nnn' escapes)

-O

Re: Very strange Error in Updates

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> The problem, as I understand it from Tom's explanation, is that
> SQL_ASCII only works if everyone is using the same client_encoding; the
> server has no knowledge of the real underlying encoding of the data so
> can't do conversions.

Not only can the server not do conversions, but it cannot count string
lengths "correctly" in strings that are really in a multibyte encoding.
When JDBC sends a UTF8 string that contains some non-ASCII characters,
the server can store the string safely, but it cannot operate on it
in any intelligent way.

I wonder whether the JDBC driver ought to warn about it if it sees
server_encoding == SQL_ASCII?  You're certainly just asking for trouble
to use JDBC with such a setting.

> JDBC always uses a client_encoding of UNICODE. I don't know what ODBC
> does, but apparently it's not using UNICODE.

ODBC is probably just passing through the client data as-is, and not
doing anything at all with the encoding settings.

> Perhaps one option is to set the database encoding to UNICODE, and
> either get the ODBC driver to issue an appropriate "SET client_encoding"
> on connection setup (I don't know if ODBC lets you do this) or set the
> default client_encoding in postgresql.conf to whatever is appropriate
> for ODBC clients?

That would work.  Plan B would be to set the database encoding to
whatever the ODBC clients are using, and let encoding conversions happen
when talking to a JDBC client.

The one thing that is absolutely, positively guaranteed not to work is
setting the DB encoding to SQL_ASCII.  That defeats any chance you have
of getting intelligent encoding behavior from the system.

            regards, tom lane

Re: Very strange Error in Updates

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> Tom Lane wrote:
>> Not really.  SQL_ASCII encoding basically means "we don't know what this
>> data is, just store it verbatim".  So the UTF-8 string sent by the
>> driver is stored verbatim.

> Hmm, so SQL_ASCII is not really a first-class encoding -- it doesn't do
> encoding conversions at all?

Correct.  BTW, setting client_encoding to SQL_ASCII also disables
on-the-wire encoding conversions (so that client data had better be in
whatever the database encoding is).

> What about refusing to change client_encoding to something other than
> SQL_ASCII on SQL_ASCII databases?

Not sure that would do anything very useful.  People who aren't thinking
about this probably aren't thinking about setting client_encoding
properly, either.

> Or perhaps the JDBC driver could issue a warning whenever it notices the
> underlying encoding is SQL_ASCII (this means another round-trip on
> connection setup even when using V3 though).

Something like this seems reasonable.  I'm not sure why we didn't make
server_encoding be GUC_REPORT so that it would be sent automatically
during connection startup ... we could change that in 7.5 if it would
help any ...

> Sure -- it just depends on what other clients use the db. By the sounds
> of it in this case the other client is an ODBC client that isn't aware
> of encodings at all.. I suppose this can be handled by the default
> client_encoding setting in postgresql.conf?

Yeah, as long as there is one specific encoding that all the
encoding-ignorant clients are using.  If there's more than one, perhaps
you could get it to work by specifying per-user or per-database default
client_encoding settings (see ALTER USER and ALTER DATABASE).

            regards, tom lane

Re: Very strange Error in Updates

From
Oliver Jowett
Date:
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
>
>>What about refusing to change client_encoding to something other than
>>SQL_ASCII on SQL_ASCII databases?
>
>
> Not sure that would do anything very useful.  People who aren't thinking
> about this probably aren't thinking about setting client_encoding
> properly, either.

I was thinking about it from the other angle -- clients that set
client_encoding and expect the server to do the conversion (e.g. the
JDBC driver) will see an error rather than bogus unconverted data.

What does the server currently do if you ask for a client_encoding that
isn't supported by the database encoding (e.g. LATIN1<->LATIN2)? It
seems to me that SQL_ASCII is kinda-sorta-if-you-squint-a-bit like an
encoding that doesn't support any client_encoding but SQL_ASCII.

-O

Re: Very strange Error in Updates - Worst than ever !

From
"Dario V. Fassi"
Date:
We do a ODBC program to migrate the DB (SQL_ASCII) to a DB with UNICODE encoding .

This DB in ODBC with set CLIENT_ENCODING='UNICODE' , work fine.

1) Now from  a JDBC java program , we read a row that has a field  CALLE varchar(30) = 'ññññññññññññññññ' ,
2) then we do an Update of another field in the same row ,
3) then the untouched field ends CALLE varchar(30) = ''

I'm absolutly  lost in this problem.

Dario.

Oliver Jowett wrote:
Tom Lane wrote:
Oliver Jowett <oliver@opencloud.com> writes:

What about refusing to change client_encoding to something other than SQL_ASCII on SQL_ASCII databases?


Not sure that would do anything very useful.  People who aren't thinking
about this probably aren't thinking about setting client_encoding
properly, either.

I was thinking about it from the other angle -- clients that set client_encoding and expect the server to do the conversion (e.g. the JDBC driver) will see an error rather than bogus unconverted data.

What does the server currently do if you ask for a client_encoding that isn't supported by the database encoding (e.g. LATIN1<->LATIN2)? It seems to me that SQL_ASCII is kinda-sorta-if-you-squint-a-bit like an encoding that doesn't support any client_encoding but SQL_ASCII.

-O

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend



--

    Dario V. Fassi


SISTEMATICA ingenieria de software  srl
Ituzaingo 1628  (2000)  Rosario, Santa Fe, Argentina.
Tel / Fax:  +54 (341) 485.1432 / 485.1353



Re: Very strange Error in Updates

From
"Dario V. Fassi"
Date:
Now as suggested we are working with a UNICODE database.
With lastest dev Jdbc driver , don't work.
With and older mammoth Jdbc driver they work but the original problem return.

A java string of 30 chars can't be updated over a varchar(30) field, even with a UNICODE db.

Something in the middle has a wrong count of chars even if a database encoding is Unicode. 
Server or Interface ?

Dario.

Dario V. Fassi wrote:
Worst cases:

The receiving field is a Varchar(30) .
Sample data :

CALLE=[ENFERMERA CLEMON. B- ALTO ALBR]  len=30    : Is Updated Ok.
CALLE=[ENFERMERA CLEMON. Bº ALTO ALBR]  len=30    : Is NOT Updated
CALLE=[ENFERMERA CLEMON. Bº ALTO ALB]  len=29    : Is Updated Ok.

Dario V. Fassi wrote:
I found a very strange error/behavior in a PreparedStatement  for a simple SQL Update over a VARCHAR field.
The code is like:

   Statement stmt = con.createStatement();
   PreparedStatement pstIns = con.prepareStatement("update userid.t set calle = ? ");         ResultSet rs = stmt.executeQuery( "select calle from userid.t2" );
   while ( rs.next() ) {
               pstIns.clearParameters();
               String x = rs.getString("CALLE");
               pstIns.setString(1, x );
               int nrows = pstIns.executeUpdate();
               System.out.println( "Filas afectadas "+ nrows );
    }
         
When the parameter fill the full-length of  receiving field and has any non common character, the update throw a exception like:

java.sql.SQLException: ERROR:  value too long for type character varying(30)
   at org.postgresql.core.v2.QueryExecutorImpl.receiveErrorMessage(QueryExecutorImpl.java:420)
   at org.postgresql.core.v2.QueryExecutorImpl.processResults(QueryExecutorImpl.java:345)
   at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:251)
   at org.postgresql.core.v2.QueryExecutorImpl.execute(QueryExecutorImpl.java:159)
   at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:342)
   at org.postgresql.jdbc1.AbstractJdbc1Statement.executeWithFlags(AbstractJdbc1Statement.java:290)
   at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(AbstractJdbc1Statement.java:245)
   at com.sistemat.app.rsmaster.reloadable.RepImportar.PrDw_Cuentas(RepImportar.java:2471)
   at com.sistemat.app.rsmaster.reloadable.RepImportar.Importar(RepImportar.java:260)
   at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.Ciclo_Replicacion(RepMasterImpl.java:955)
   at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.runWorker(RepMasterImpl.java:748)
   at com.sistemat.app.rsmaster.reloadable.RepMasterImpl.run(RepMasterImpl.java:427)
   at java.lang.Thread.run(Unknown Source)

This field/value produce the Exception:   CALLE=[ENFERMERA CLEMON. B§ ALTO ALBR]

If this field is changed to:                        CALLE=[ENFERMERA CLEMON. B# ALTO ALBR]
then  the update is performed without
any problem.

Although the first value can be Inserted  with a prepared statement  without problems.

A have no explanation to this case , and any help will be appreciated.

Dario Fassi.



Re: Very strange Error in Updates - At last resolved !

From
"Dario V. Fassi"
Date:
After try  *all* versions of jdbc drivers I found that the problems disapear using:

1)  DB with UNICODE  encoding.
2) Jdbc Driver :      pg74.214.jdbc2.jar      (186948 bytes )

Thanks Oliver, Tom and Kris for your time and help.

Dario V. Fassi wrote:
We do a ODBC program to migrate the DB (SQL_ASCII) to a DB with UNICODE encoding .

This DB in ODBC with set CLIENT_ENCODING='UNICODE' , work fine.

1) Now from  a JDBC java program , we read a row that has a field  CALLE varchar(30) = 'ññññññññññññññññ' ,
2) then we do an Update of another field in the same row ,
3) then the untouched field ends CALLE varchar(30) = ''

I'm absolutly  lost in this problem.

Dario.

Oliver Jowett wrote:
Tom Lane wrote:
Oliver Jowett <oliver@opencloud.com> writes:

What about refusing to change client_encoding to something other than SQL_ASCII on SQL_ASCII databases?


Not sure that would do anything very useful.  People who aren't thinking
about this probably aren't thinking about setting client_encoding
properly, either.

I was thinking about it from the other angle -- clients that set client_encoding and expect the server to do the conversion (e.g. the JDBC driver) will see an error rather than bogus unconverted data.

What does the server currently do if you ask for a client_encoding that isn't supported by the database encoding (e.g. LATIN1<->LATIN2)? It seems to me that SQL_ASCII is kinda-sorta-if-you-squint-a-bit like an encoding that doesn't support any client_encoding but SQL_ASCII.

-O