Thread: Encoding from CopyManager.copyIn()

Encoding from CopyManager.copyIn()

From
Markus Kickmaier
Date:
Hello,

I'm using the copyIn() function of the CopyManager. It works fine until I don't use an "umlaut" like ü. Then i get an
PSQLException:

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0xfc

My code looks like follows:

ByteArrayOutputStream output = new ByteArrayOutputStream();
PrintWriter writer = new PrintWriter(output);
writer.println("abcüäö");
writer.flush();
ByteArrayInputStream input = new ByteArrayInputStream(output.toByteArray());
long result = ((PGConnection) con_).getCopyAPI().copyIn(statement, input);

After searching at google i found out that this is an encoding problem. The database doesn't know what charset I'm
using.

Any suggestion how i can specify the encoding i want to use?

BR, Markus

Re: Encoding from CopyManager.copyIn()

From
Kris Jurka
Date:

On Wed, 22 Jul 2009, Markus Kickmaier wrote:

> I'm using the copyIn() function of the CopyManager. It works fine until
> I don't use an "umlaut" like ?. Then i get an PSQLException:
>
> org.postgresql.util.PSQLException: ERROR: invalid byte sequence for
> encoding "UTF8": 0xfc
>
> My code looks like follows:
>
> ByteArrayOutputStream output = new ByteArrayOutputStream();
> PrintWriter writer = new PrintWriter(output);
> writer.println("abc???");
> writer.flush();
> ByteArrayInputStream input = new ByteArrayInputStream(output.toByteArray());
> long result = ((PGConnection) con_).getCopyAPI().copyIn(statement, input);

You should be using the copyIn(String, Reader) function rather than
InputStream.  That way the CopyManager can encode the provide data to
the database in the encoding it requires.

If using the InputStream method, you need to provide the data in UTF-8
encoding.

Kris Jurka


Re: Encoding from CopyManager.copyIn()

From
Daniel Migowski
Date:
The PrintWriter has another constructor where you can give "UTF8" as
Encoding. Default is platform encoding (usually "Win1252" for me in
Germany).

Best,
Daniel Migowski

Markus Kickmaier schrieb:
> Hello,
>
> I'm using the copyIn() function of the CopyManager. It works fine until I don't use an "umlaut" like ü. Then i get an
PSQLException:
>
> org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0xfc
>
> My code looks like follows:
>
> ByteArrayOutputStream output = new ByteArrayOutputStream();
> PrintWriter writer = new PrintWriter(output);
> writer.println("abcüäö");
> writer.flush();
> ByteArrayInputStream input = new ByteArrayInputStream(output.toByteArray());
> long result = ((PGConnection) con_).getCopyAPI().copyIn(statement, input);
>
> After searching at google i found out that this is an encoding problem. The database doesn't know what charset I'm
using.
>
> Any suggestion how i can specify the encoding i want to use?
>
> BR, Markus
>
>



Re: Encoding from CopyManager.copyIn()

From
Daniel Migowski
Date:
Or, in your case you have to wrap the OutputStream in an
OutputStreamWriter (which has the encoding parameter in the constructor).

best
Daniel

Markus Kickmaier schrieb:
> Hello,
>
> I'm using the copyIn() function of the CopyManager. It works fine until I don't use an "umlaut" like ü. Then i get an
PSQLException:
>
> org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0xfc
>
> My code looks like follows:
>
> ByteArrayOutputStream output = new ByteArrayOutputStream();
> PrintWriter writer = new PrintWriter(output);
> writer.println("abcüäö");
> writer.flush();
> ByteArrayInputStream input = new ByteArrayInputStream(output.toByteArray());
> long result = ((PGConnection) con_).getCopyAPI().copyIn(statement, input);
>
> After searching at google i found out that this is an encoding problem. The database doesn't know what charset I'm
using.
>
> Any suggestion how i can specify the encoding i want to use?
>
> BR, Markus
>
>



Re: Encoding from CopyManager.copyIn()

From
Markus Kickmaier
Date:
Thanks for the Responses Daniel and Kris,

but i just don't get it work. I know now what exactly my problem is.
I have a SQL_ASCCI encoded database. The JDBC driver uses UNICODE as client_encoding. So if i want to copy an 'umlaut'
likeü into a table i get the error: invalid byte sequence for UTF8... 

If i test this in pgAdmin it is the same. But if i set client_encoding to 'SQL_ASCII' in pgAdmin it works fine.
Trying this for my JDBC connection i get a PSQL Exception saying that the client_encoding parameter was changed to
SQL_ASCIIand the JDBC driver just works correctly with UNICODE. 

Any ideas? I'm rather sure it would work if JDBC would let me use SQL_ASCII.

BR, Markus

----- "Daniel Migowski" <dmigowski@ikoffice.de> schrieb:

> Or, in your case you have to wrap the OutputStream in an
> OutputStreamWriter (which has the encoding parameter in the
> constructor).
>
> best
> Daniel
>
> Markus Kickmaier schrieb:
> > Hello,
> >
> > I'm using the copyIn() function of the CopyManager. It works fine
> until I don't use an "umlaut" like ü. Then i get an PSQLException:
> >
> > org.postgresql.util.PSQLException: ERROR: invalid byte sequence for
> encoding "UTF8": 0xfc
> >
> > My code looks like follows:
> >
> > ByteArrayOutputStream output = new ByteArrayOutputStream();
> > PrintWriter writer = new PrintWriter(output);
> > writer.println("abcüäö");
> > writer.flush();
> > ByteArrayInputStream input = new
> ByteArrayInputStream(output.toByteArray());
> > long result = ((PGConnection) con_).getCopyAPI().copyIn(statement,
> input);
> >
> > After searching at google i found out that this is an encoding
> problem. The database doesn't know what charset I'm using.
> >
> > Any suggestion how i can specify the encoding i want to use?
> >
> > BR, Markus
> >
> >

Re: Encoding from CopyManager.copyIn()

From
Oliver Jowett
Date:
Markus Kickmaier wrote:
> Thanks for the Responses Daniel and Kris,
>
> but i just don't get it work. I know now what exactly my problem is.
> I have a SQL_ASCCI encoded database. The JDBC driver uses UNICODE as client_encoding. So if i want to copy an
'umlaut'like ü into a table i get the error: invalid byte sequence for UTF8... 
>
> If i test this in pgAdmin it is the same. But if i set client_encoding to 'SQL_ASCII' in pgAdmin it works fine.
> Trying this for my JDBC connection i get a PSQL Exception saying that the client_encoding parameter was changed to
SQL_ASCIIand the JDBC driver just works correctly with UNICODE. 
>
> Any ideas? I'm rather sure it would work if JDBC would let me use SQL_ASCII.

You should convert your database to an appropriate encoding for the data
it contains (perhaps LATIN1?). If the database encoding is SQL_ASCII,
the JDBC driver has no way of knowing how to convert bytes >127 to
Java's UTF-16 String representation.

Basically, SQL_ASCII is only going to work with the JDBC driver if you
only store 7-bit ASCII, or if you happen to be very lucky and have all
clients everywhere use a client_encoding of UNICODE.

-O

Re: Encoding from CopyManager.copyIn()

From
Markus Kickmaier
Date:
Hi,

what I've done now is following:

- I converted my database to UTF8.
- I use a OutputStreamWriter with UTF8 as encoding to fill my stream for the copy statement.

Now it works. Thanks for your help.

BR, Markus


----- "Oliver Jowett" <oliver@opencloud.com> schrieb:

> Markus Kickmaier wrote:
> > Thanks for the Responses Daniel and Kris,
> >
> > but i just don't get it work. I know now what exactly my problem
> is.
> > I have a SQL_ASCCI encoded database. The JDBC driver uses UNICODE as
> client_encoding. So if i want to copy an 'umlaut' like ü into a table
> i get the error: invalid byte sequence for UTF8...
> >
> > If i test this in pgAdmin it is the same. But if i set
> client_encoding to 'SQL_ASCII' in pgAdmin it works fine.
> > Trying this for my JDBC connection i get a PSQL Exception saying
> that the client_encoding parameter was changed to SQL_ASCII and the
> JDBC driver just works correctly with UNICODE.
> >
> > Any ideas? I'm rather sure it would work if JDBC would let me use
> SQL_ASCII.
>
> You should convert your database to an appropriate encoding for the
> data
> it contains (perhaps LATIN1?). If the database encoding is SQL_ASCII,
>
> the JDBC driver has no way of knowing how to convert bytes >127 to
> Java's UTF-16 String representation.
>
> Basically, SQL_ASCII is only going to work with the JDBC driver if you
>
> only store 7-bit ASCII, or if you happen to be very lucky and have all
>
> clients everywhere use a client_encoding of UNICODE.
>
> -O