Thread: invalid byte sequence for encoding "UTF8": 0x00
Hi, I've got another problem. I sometimes get the following SQLException when doing an insert: ERROR: invalid byte sequence for encoding "UTF8": 0x00 Exception: org.postgresql.util.PSQLException org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525) org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309) org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354) org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308) By the way, the insert is done with a PreparedStatement and I use only setLong(), setString(), setTimestamp() and setInt(). I don't understand it very well. It is obviously an encoding exception but I don't know why it happens and what I could do avoid it. Any idea? _________________________________________________________________ Opret en personlig blog og del dine billeder p� MSN Spaces: http://spaces.msn.com/
James Im <im-james@hotmail.com> wrote:
Hi,
I've got another problem. I sometimes get the following SQLException
when doing an insert:
ERROR: invalid byte sequence for encoding "UTF8": 0x00
Exception: org.postgresql.util.PSQLException
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525)
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308)
By the way, the insert is done with a PreparedStatement and I use only
setLong(), setString(), setTimestamp() and setInt().
I don't understand it very well. It is obviously an encoding exception
but I don't know why it happens and what I could do avoid it.
Any idea?
_________________________________________________________________
Opret en personlig blog og del dine billeder på MSN Spaces:
http://spaces.msn.com/
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Don't get soaked. Take a quick peak at the forecast
with theYahoo! Search weather shortcut.
James Im wrote: > I've got another problem. I sometimes get the following SQLException > when doing an insert: > > ERROR: invalid byte sequence for encoding "UTF8": 0x00 You're trying to insert a string which contains a '\0' character. The server can't handle strings containing embedded NULs, as it uses C-style string termination internally. -O
I've had the same error, and it is in fact because in Java you can actually have a "0x0" character in your string, and that's valid unicode. So that's translated to the character 0x0 in UTF8, which in turn is not accepted because the server uses null terminated strings... so the only way is to make sure your strings don't contain the character '\u0000'. I identified the place in my code which was generating such a character and fixed, and I didn't have other problems after that... even if I still think forbidding a valid character is a somewhat arbitrary restriction. HTH, Csaba. On Tue, 2007-02-20 at 11:57, Altaf Malik wrote: > Try to change the encoding of your database to "Unicode". > I hope this helps. > > --Altaf Malik > EnterpriseDB > www.enterprisedb.com > James Im <im-james@hotmail.com> wrote: > Hi, > > I've got another problem. I sometimes get the following > SQLException > when doing an insert: > > ERROR: invalid byte sequence for encoding "UTF8": 0x00 > Exception: org.postgresql.util.PSQLException > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525) > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309) > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354) > org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308) > > By the way, the insert is done with a PreparedStatement and I > use only > setLong(), setString(), setTimestamp() and setInt(). > > > I don't understand it very well. It is obviously an encoding > exception > but I don't know why it happens and what I could do avoid it. > > Any idea? > > _________________________________________________________________ > Opret en personlig blog og del dine billeder på MSN Spaces: > http://spaces.msn.com/ > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an > appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly > > > ______________________________________________________________________ > Don't get soaked. Take aquick peak at the forecast > with theYahoo! Search weather shortcut.
Oliver Jowett wrote: > James Im wrote: > >> I've got another problem. I sometimes get the following SQLException >> when doing an insert: >> >> ERROR: invalid byte sequence for encoding "UTF8": 0x00 > > You're trying to insert a string which contains a '\0' character. The > server can't handle strings containing embedded NULs, as it uses C-style > string termination internally. > At least on other servers/drivers I believe nulls are supported (and should be according to some spec) (The only special-meaning char is single quote). I'm wondering how the binary protocol works insofar as handling the NULL byte; does it precede it with a backslash? I'm wondering if this would be possible for the String conversion as well -- just for sake of consistency with other DBs (and since some API inevitable expect users to send binary data through a char-sequence interface)
Ken Johanson wrote: > At least on other servers/drivers I believe nulls are supported (and > should be according to some spec) (The only special-meaning char is > single quote). The driver can't do anything about it, it's a server issue. I can think of some ways the server could support it without extensive changes .. e.g. use a "modified UTF8" representation which stores \u0000 as 0xc0 0x80 internally .. but you'd have to take that up with the backend developers. > I'm wondering how the binary protocol works insofar as handling the NULL > byte; does it precede it with a backslash? The driver sends string parameters out-of-line without escaping (i.e. length field, then raw utf-8 data). The error you see is generated when the server notices that there's a \u0000 there; it rejects the string entirely rather than silently mangling it. -O
On Feb 22, 2007, at 07:37, Ken Johanson wrote: > At least on other servers/drivers I believe nulls are supported > (and should be according to some spec) (The only special-meaning > char is single quote). Yes. I got this error while copying data from a MS SQL Server to PostgreSQL. - Tore.
the context is: http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-quot-UTF8-quot-0x00-td2172080.html I've also got this problem while copy or insert data from MS SQL Server to PostgreSQL. SQLServer 2008 R2, encoding :GBK PostgreSQL 9.3.4, encoding:UTF8 Rather than modify the column value in sql server, are there any other ways to solve this problem? Better to give me an examples. Thanks! Yours, Jasmine -- View this message in context: http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp2172080p5816498.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
JasmineLiu wrote: > I've also got this problem while copy or insert data from MS SQL Server to > PostgreSQL. > SQLServer 2008 R2, encoding :GBK > PostgreSQL 9.3.4, encoding:UTF8 > > Rather than modify the column value in sql server, > are there any other ways to solve this problem? > Better to give me an examples. You will never be able to insert a null character into a PostgreSQL database. You can either modify the source data or change the data in transit. Yours, Laurenz Albe
Thanks Laurenz Albe ! Can you give me an example of changing the data in transit? Regards! Yours, Jasmine Liu -- View this message in context: http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp2172080p5816502.html Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
JasmineLiu wrote: > Can you give me an example of changing the data in transit? You export the data to a file, modify the file (with tools like sed or something more complicated), and load the result. Yours, Laurenz Albe
JasmineLiu wrote:
> I've also got this problem while copy or insert data from MS SQL Server to
> PostgreSQL.
> SQLServer 2008 R2, encoding :GBK
> PostgreSQL 9.3.4, encoding:UTF8
>
> Rather than modify the column value in sql server,
> are there any other ways to solve this problem?
> Better to give me an examples.
You will never be able to insert a null character into a PostgreSQL database.
You can either modify the source data or change the data in transit.
Attachment
Thanks Andreas!
I like this clear answer!
My colleague and I have found about this in PostgreSQL source code, but we’re not so sure.
Thanks for your reply!
Regards!
Jasmine Liu
发件人: Andreas Joseph Krogh-2 [via PostgreSQL] [mailto:[hidden email]]
发送时间: 2014年8月28日 20:46
收件人: JasmineLiu
主题: Re: invalid byte sequence for encoding "UTF8": 0x00
På onsdag 27. august 2014 kl. 10:11:09, skrev Albe Laurenz <[hidden email]>:
JasmineLiu wrote:
> I've also got this problem while copy or insert data from MS SQL Server to
> PostgreSQL.
> SQLServer 2008 R2, encoding :GBK
> PostgreSQL 9.3.4, encoding:UTF8
>
> Rather than modify the column value in sql server,
> are there any other ways to solve this problem?
> Better to give me an examples.
You will never be able to insert a null character into a PostgreSQL database.
You can either modify the source data or change the data in transit.
This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.
Usually the \0 isn't important so you can do this in JAVA before inserting into PG:
someString.replace('\0', ' ') or someString.replaceAll("\0", "")
If you reply to this email, your message will be added to the discussion below:
To unsubscribe from invalid byte sequence for encoding "UTF8": 0x00, click here.
NAML
View this message in context: Re: invalid byte sequence for encoding "UTF8": 0x00
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.
Joseph Krogh wrote: >> You will never be able to insert a null character into a PostgreSQL database. >> You can either modify the source data or change the data in transit. > > This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns. I was talking about characters, not bytes. Yours, Laurenz Albe
Joseph Krogh wrote:
>> You will never be able to insert a null character into a PostgreSQL database.
>> You can either modify the source data or change the data in transit.
>
> This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.
I was talking about characters, not bytes.
Attachment
Andreas Joseph Krogh wrote: >>>> You will never be able to insert a null character into a PostgreSQL database. >>>> You can either modify the source data or change the data in transit. >>> >>> This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns. >> >> I was talking about characters, not bytes. > > '\0' is a character. I see noe specification of character-fields (like varchar and text) in your > answer. My definition would be: A character is something that is normally written on paper and has to be encoded to be stored in a computer system. (seems not to stray to far from Wikipedia's definition.) Characters can only occur in text fields. An element of a bytea is not a character along these lines; hence the type is called "BYTE Array". But let's not split hairs, this is getting away from the problem at hand, and I think you know what I mean and vice versa. Yours, Laurenz Albe