Thread: invalid byte sequence for encoding "UTF8": 0x00

invalid byte sequence for encoding "UTF8": 0x00

From
"James Im"
Date:
Hi,

I've got another problem. I sometimes get the following SQLException
when doing an insert:

ERROR: invalid byte sequence for encoding "UTF8": 0x00
Exception: org.postgresql.util.PSQLException
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525)
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308)

By the way, the insert is done with a PreparedStatement and I use only
setLong(), setString(), setTimestamp() and setInt().


I don't understand it very well. It is obviously an encoding exception
but I don't know why it happens and what I could do avoid it.

Any idea?

_________________________________________________________________
Opret en personlig blog og del dine billeder p� MSN Spaces:
http://spaces.msn.com/


Re: invalid byte sequence for encoding "UTF8": 0x00

From
Altaf Malik
Date:
Try to change the encoding of your database to "Unicode".
I hope this helps.
 
--Altaf Malik
EnterpriseDB

James Im <im-james@hotmail.com> wrote:
Hi,

I've got another problem. I sometimes get the following SQLException
when doing an insert:

ERROR: invalid byte sequence for encoding "UTF8": 0x00
Exception: org.postgresql.util.PSQLException
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525)
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308)

By the way, the insert is done with a PreparedStatement and I use only
setLong(), setString(), setTimestamp() and setInt().


I don't understand it very well. It is obviously an encoding exception
but I don't know why it happens and what I could do avoid it.

Any idea?

_________________________________________________________________
Opret en personlig blog og del dine billeder på MSN Spaces:
http://spaces.msn.com/


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly


Don't get soaked. Take a quick peak at the forecast
with theYahoo! Search weather shortcut.

Re: invalid byte sequence for encoding "UTF8": 0x00

From
Oliver Jowett
Date:
James Im wrote:

> I've got another problem. I sometimes get the following SQLException
> when doing an insert:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0x00

You're trying to insert a string which contains a '\0' character. The
server can't handle strings containing embedded NULs, as it uses C-style
string termination internally.

-O

Re: invalid byte sequence for encoding "UTF8": 0x00

From
Csaba Nagy
Date:
I've had the same error, and it is in fact because in Java you can
actually have a "0x0" character in your string, and that's valid
unicode. So that's translated to the character 0x0 in UTF8, which in
turn is not accepted because the server uses null terminated strings...
so the only way is to make sure your strings don't contain the character
'\u0000'.

I identified the place in my code which was generating such a character
and fixed, and I didn't have other problems after that... even if I
still think forbidding a valid character is a somewhat arbitrary
restriction.

HTH,
Csaba.

On Tue, 2007-02-20 at 11:57, Altaf Malik wrote:
> Try to change the encoding of your database to "Unicode".
> I hope this helps.
>
> --Altaf Malik
> EnterpriseDB
> www.enterprisedb.com
> James Im <im-james@hotmail.com> wrote:
>         Hi,
>
>         I've got another problem. I sometimes get the following
>         SQLException
>         when doing an insert:
>
>         ERROR: invalid byte sequence for encoding "UTF8": 0x00
>         Exception: org.postgresql.util.PSQLException
>         org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525)
>         org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309)
>         org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
>         org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
>         org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354)
>         org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308)
>
>         By the way, the insert is done with a PreparedStatement and I
>         use only
>         setLong(), setString(), setTimestamp() and setInt().
>
>
>         I don't understand it very well. It is obviously an encoding
>         exception
>         but I don't know why it happens and what I could do avoid it.
>
>         Any idea?
>
>         _________________________________________________________________
>         Opret en personlig blog og del dine billeder på MSN Spaces:
>         http://spaces.msn.com/
>
>
>         ---------------------------(end of
>         broadcast)---------------------------
>         TIP 1: if posting/reading through Usenet, please send an
>         appropriate
>         subscribe-nomail command to majordomo@postgresql.org so that
>         your
>         message can get through to the mailing list cleanly
>
>
> ______________________________________________________________________
> Don't get soaked. Take aquick peak at the forecast
> with theYahoo! Search weather shortcut.


Re: invalid byte sequence for encoding "UTF8": 0x00

From
Ken Johanson
Date:
Oliver Jowett wrote:
> James Im wrote:
>
>> I've got another problem. I sometimes get the following SQLException
>> when doing an insert:
>>
>> ERROR: invalid byte sequence for encoding "UTF8": 0x00
>
> You're trying to insert a string which contains a '\0' character. The
> server can't handle strings containing embedded NULs, as it uses C-style
> string termination internally.
>


At least on other servers/drivers I believe nulls are supported (and
should be according to some spec) (The only special-meaning char is
single quote).

I'm wondering how the binary protocol works insofar as handling the NULL
byte; does it precede it with a backslash? I'm wondering if this would
be possible for the String conversion as well -- just for sake of
consistency with other DBs (and since some API inevitable expect users
to send binary data through a char-sequence interface)



Re: invalid byte sequence for encoding "UTF8": 0x00

From
Oliver Jowett
Date:
Ken Johanson wrote:

> At least on other servers/drivers I believe nulls are supported (and
> should be according to some spec) (The only special-meaning char is
> single quote).

The driver can't do anything about it, it's a server issue. I can think
of some ways the server could support it without extensive changes ..
e.g. use a "modified UTF8" representation which stores \u0000 as 0xc0
0x80 internally .. but you'd have to take that up with the backend
developers.

> I'm wondering how the binary protocol works insofar as handling the NULL
> byte; does it precede it with a backslash?

The driver sends string parameters out-of-line without escaping (i.e.
length field, then raw utf-8 data). The error you see is generated when
the server notices that there's a \u0000 there; it rejects the string
entirely rather than silently mangling it.

-O

Re: invalid byte sequence for encoding "UTF8": 0x00

From
Tore Halset
Date:
On Feb 22, 2007, at 07:37, Ken Johanson wrote:

> At least on other servers/drivers I believe nulls are supported
> (and should be according to some spec) (The only special-meaning
> char is single quote).

Yes. I got this error while copying data from a MS SQL Server to
PostgreSQL.

  - Tore.

Re: invalid byte sequence for encoding "UTF8": 0x00

From
JasmineLiu
Date:
the context is:
http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-quot-UTF8-quot-0x00-td2172080.html

I've also got this problem while copy or insert data from MS SQL Server to
PostgreSQL.
SQLServer 2008 R2, encoding :GBK
PostgreSQL 9.3.4, encoding:UTF8

Rather than modify the column value in sql server,
are  there any other ways to solve this problem?
Better to give me an  examples.
Thanks!

Yours,
Jasmine



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp2172080p5816498.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.


Re: invalid byte sequence for encoding "UTF8": 0x00

From
Albe Laurenz
Date:
JasmineLiu wrote:
> I've also got this problem while copy or insert data from MS SQL Server to
> PostgreSQL.
> SQLServer 2008 R2, encoding :GBK
> PostgreSQL 9.3.4, encoding:UTF8
> 
> Rather than modify the column value in sql server,
> are  there any other ways to solve this problem?
> Better to give me an  examples.

You will never be able to insert a null character into a PostgreSQL database.
You can either modify the source data or change the data in transit.

Yours,
Laurenz Albe

Re: invalid byte sequence for encoding "UTF8": 0x00

From
JasmineLiu
Date:
Thanks Laurenz Albe !
Can you give me an example of  changing the data in transit?

Regards!

Yours,
Jasmine Liu



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp2172080p5816502.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.


Re: invalid byte sequence for encoding "UTF8": 0x00

From
Albe Laurenz
Date:
JasmineLiu wrote:
> Can you give me an example of  changing the data in transit?

You export the data to a file, modify the file (with tools like sed
or something more complicated), and load the result.

Yours,
Laurenz Albe

Re: invalid byte sequence for encoding "UTF8": 0x00

From
Andreas Joseph Krogh
Date:
På onsdag 27. august 2014 kl. 10:11:09, skrev Albe Laurenz <laurenz.albe@wien.gv.at>:
JasmineLiu wrote:
> I've also got this problem while copy or insert data from MS SQL Server to
> PostgreSQL.
> SQLServer 2008 R2, encoding :GBK
> PostgreSQL 9.3.4, encoding:UTF8
>
> Rather than modify the column value in sql server,
> are  there any other ways to solve this problem?
> Better to give me an  examples.

You will never be able to insert a null character into a PostgreSQL database.
You can either modify the source data or change the data in transit.
 
This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.
 
 
Usually the \0 isn't important so you can do this in JAVA before inserting into PG:
 
someString.replace('\0', ' ') or someString.replaceAll("\0", "")
 
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: invalid byte sequence for encoding "UTF8": 0x00

From
JasmineLiu
Date:

Thanks Andreas!  

I like this clear answer!

My colleague and I have found about this in PostgreSQL source code, but we’re not so sure.

Thanks for your reply!

 

Regards!

 

Jasmine Liu

 

发件人: Andreas Joseph Krogh-2 [via PostgreSQL] [mailto:[hidden email]]
发送时间: 2014828 20:46
收件人: JasmineLiu
主题: Re: invalid byte sequence for encoding "UTF8": 0x00

 

På onsdag 27. august 2014 kl. 10:11:09, skrev Albe Laurenz <[hidden email]>:

JasmineLiu wrote:
> I've also got this problem while copy or insert data from MS SQL Server to
> PostgreSQL.
> SQLServer 2008 R2, encoding :GBK
> PostgreSQL 9.3.4, encoding:UTF8
>
> Rather than modify the column value in sql server,
> are  there any other ways to solve this problem?
> Better to give me an  examples.

You will never be able to insert a null character into a PostgreSQL database.
You can either modify the source data or change the data in transit.

 

This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.

 

 

Usually the \0 isn't important so you can do this in JAVA before inserting into PG:

 

someString.replace('\0', ' ') or someString.replaceAll("\0", "")

 

 

--

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

 

 


If you reply to this email, your message will be added to the discussion below:

http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp2172080p5816732.html

To unsubscribe from invalid byte sequence for encoding "UTF8": 0x00, click here.
NAML



View this message in context: Re: invalid byte sequence for encoding "UTF8": 0x00
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.

Re: invalid byte sequence for encoding "UTF8": 0x00

From
Albe Laurenz
Date:
Joseph Krogh wrote:
>> You will never be able to insert a null character into a PostgreSQL database.
>> You can either modify the source data or change the data in transit.
> 
> This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.

I was talking about characters, not bytes.

Yours,
Laurenz Albe

Re: invalid byte sequence for encoding "UTF8": 0x00

From
Andreas Joseph Krogh
Date:
På fredag 29. august 2014 kl. 11:54:35, skrev Albe Laurenz <laurenz.albe@wien.gv.at>:
Joseph Krogh wrote:
>> You will never be able to insert a null character into a PostgreSQL database.
>> You can either modify the source data or change the data in transit.
>
> This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.

I was talking about characters, not bytes.
 
'\0' is a character. I see noe specification of character-fields (like varchar and text) in your answer.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: invalid byte sequence for encoding "UTF8": 0x00

From
Albe Laurenz
Date:
Andreas Joseph Krogh wrote:
>>>> You will never be able to insert a null character into a PostgreSQL database.
>>>> You can either modify the source data or change the data in transit.
>>>
>>> This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.
>> 
>> I was talking about characters, not bytes.
> 
> '\0' is a character. I see noe specification of character-fields (like varchar and text) in your
> answer.

My definition would be:
A character is something that is normally written on paper and has
to be encoded to be stored in a computer system.
(seems not to stray to far from Wikipedia's definition.)

Characters can only occur in text fields.

An element of a bytea is not a character along these lines; hence the
type is called "BYTE Array".

But let's not split hairs, this is getting away from the problem at hand,
and I think you know what I mean and vice versa.

Yours,
Laurenz Albe