Thread: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae
BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17615 Logged by: Souvik Chattopadhyay Email address: chatterjeesouvik.besu@gmail.com PostgreSQL version: 10.21 Operating system: CentOS 7.9 Description: Hi, Getting the below error while inserting records into the table: invalid byte sequence for encoding "UTF8": 0xae Insert statement: insert into xx_test values ('Remmo® 20 Tablet'); Regards, Souvik
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > Getting the below error while inserting records into the table: > invalid byte sequence for encoding "UTF8": 0xae That is, in fact, an invalidly-encoded character per UTF8 rules, so I see no reason to think there is any Postgres bug here. What's more likely is that you haven't set client_encoding to match the encoding of the data you're trying to insert. regards, tom lane
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae
From
Souvik Chatterjee
Date:
We have set the client encoding to UTF-8, but still error is coming.
This is getting saved properly in Oracle databases, then what's the issue postgres?
regards,
Souvik Chattopadhyay
Souvik Chattopadhyay
On Fri, 16 Sept 2022, 01:03 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> Getting the below error while inserting records into the table:
> invalid byte sequence for encoding "UTF8": 0xae
That is, in fact, an invalidly-encoded character per UTF8 rules,
so I see no reason to think there is any Postgres bug here.
What's more likely is that you haven't set client_encoding to
match the encoding of the data you're trying to insert.
regards, tom lane
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae
From
Tom Lane
Date:
Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes: > We have set the client encoding to UTF-8, but still error is coming. That is exactly what you *shouldn't* do, because the data you are sending is evidently not in UTF8. It's probably some LATINn variant. > This is getting saved properly in Oracle databases, then what's the issue > postgres? [ shrug... ] It's likely a matter of what software stack you have on the client side, not which server you're using exactly. regards, tom lane
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae
From
Francisco Olarte
Date:
On Fri, 16 Sept 2022 at 09:37, Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> wrote: > We have set the client encoding to UTF-8, but still error is coming. It seems you have got it backwards. From your description it seems like your client encoding is utf-8 ( the other usual encodings do not have this kind of problems, as all the byte sequences are valid in them ) and you are sending the data in a different one. (set client encoding means you tell the server "I am going to send you utf8", then you send invalid utf-8 ( my bet is on windows-1252, if client on windows ( the usual suspect ), or latin-1 if client on *ix ( rarer, as nearly all unix work in utf-8 these days ) and the server tells you so. Try what you are doing with client encoding win-1252 ( look up the exact name in the manual, I may be wrong ) to see if it does what you want. > This is getting saved properly in Oracle databases, then what's the issue postgres? These seem like pilot error to me. Probably oracle tools use another encoding by default, so you are not doing the same thing here and comparing apples to oranges. BTW, this does not even remotely look like a bug to me, you will probably get more enthusiastic and / or detailed responses in one of the general lists, I replied to this because it was the first message and I thought I had oppened the general list, and only noticed it was a bug report when I hit your bottom quote, had I noticed it I would probably just have answered "Does not look like a bug, but pilot error". Francisco Olarte.
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae
From
Souvik Chatterjee
Date:
So you meant to say registered trademark: ®
is not a valid UTF-8 character?
is not a valid UTF-8 character?
Seems strange to me.
regards,
Souvik Chattopadhyay
Souvik Chattopadhyay
On Fri, 16 Sept 2022, 08:39 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes:
> We have set the client encoding to UTF-8, but still error is coming.
That is exactly what you *shouldn't* do, because the data you are sending
is evidently not in UTF8. It's probably some LATINn variant.
> This is getting saved properly in Oracle databases, then what's the issue
> postgres?
[ shrug... ] It's likely a matter of what software stack you have on
the client side, not which server you're using exactly.
regards, tom lane
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae
From
Tom Lane
Date:
Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes: > So you meant to say registered trademark: ® > is not a valid UTF-8 character? I'm sure that there is such a Unicode character, but the way you are presenting it to the database is not UTF-8. It's some other character encoding, probably a single-byte encoding such as a member of the ISO 8859 family [1]. I see in the table there that code 0xAE is the trademark symbol in 8859-1 (LATIN1) and some but not all of the other variants. You need to arrange for the proper encoding conversion to happen. Perhaps reading [2] would help. regards, tom lane [1] https://en.wikipedia.org/wiki/ISO/IEC_8859 [2] https://www.postgresql.org/docs/current/multibyte.html