Thread: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae

The following bug has been logged on the website:

Bug reference:      17615
Logged by:          Souvik Chattopadhyay
Email address:      chatterjeesouvik.besu@gmail.com
PostgreSQL version: 10.21
Operating system:   CentOS 7.9
Description:

Hi,

Getting the below error while inserting records into the table:
invalid byte sequence for encoding "UTF8": 0xae

Insert statement:
insert into xx_test values ('Remmo® 20 Tablet');

Regards,
Souvik


PG Bug reporting form <noreply@postgresql.org> writes:
> Getting the below error while inserting records into the table:
> invalid byte sequence for encoding "UTF8": 0xae

That is, in fact, an invalidly-encoded character per UTF8 rules,
so I see no reason to think there is any Postgres bug here.
What's more likely is that you haven't set client_encoding to
match the encoding of the data you're trying to insert.

            regards, tom lane



We have set the client encoding to UTF-8, but still error is coming.

This is getting saved properly in Oracle databases, then what's the issue postgres?

regards,
Souvik Chattopadhyay

On Fri, 16 Sept 2022, 01:03 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> Getting the below error while inserting records into the table:
> invalid byte sequence for encoding "UTF8": 0xae

That is, in fact, an invalidly-encoded character per UTF8 rules,
so I see no reason to think there is any Postgres bug here.
What's more likely is that you haven't set client_encoding to
match the encoding of the data you're trying to insert.

                        regards, tom lane
Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes:
> We have set the client encoding to UTF-8, but still error is coming.

That is exactly what you *shouldn't* do, because the data you are sending
is evidently not in UTF8.  It's probably some LATINn variant.

> This is getting saved properly in Oracle databases, then what's the issue
> postgres?

[ shrug... ]  It's likely a matter of what software stack you have on
the client side, not which server you're using exactly.

            regards, tom lane



On Fri, 16 Sept 2022 at 09:37, Souvik Chatterjee
<chatterjeesouvik.besu@gmail.com> wrote:
> We have set the client encoding to UTF-8, but still error is coming.

It seems you have got it backwards. From your description it seems
like your client encoding is utf-8 ( the other usual encodings do not
have this kind of problems, as all the byte sequences are valid in
them ) and you are sending the data in a different one. (set client
encoding means you tell the server "I am going to send you utf8", then
you send invalid utf-8 ( my bet is on windows-1252, if client on
windows ( the usual suspect ), or latin-1 if client on *ix ( rarer, as
nearly all unix work in utf-8 these days ) and the server tells you
so.

Try what you are doing with client encoding win-1252 ( look up the
exact name in the manual, I may be wrong ) to see if it does what you
want.

> This is getting saved properly in Oracle databases, then what's the issue postgres?

These seem like pilot error to me. Probably oracle tools use another
encoding by default, so you are not doing the same thing here and
comparing apples to oranges.

BTW, this does not even remotely look like a bug to me, you will
probably get more enthusiastic and / or detailed responses in one of
the general lists, I replied to this because it was the first message
and I thought I had oppened the general list, and only noticed it was
a bug report when I hit your bottom quote, had I noticed it I would
probably just have answered "Does not look like a bug, but pilot
error".

Francisco Olarte.



So you meant to say registered trademark: ®
is not a valid UTF-8 character?
 
Seems strange to me.

regards,
Souvik Chattopadhyay

On Fri, 16 Sept 2022, 08:39 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes:
> We have set the client encoding to UTF-8, but still error is coming.

That is exactly what you *shouldn't* do, because the data you are sending
is evidently not in UTF8.  It's probably some LATINn variant.

> This is getting saved properly in Oracle databases, then what's the issue
> postgres?

[ shrug... ]  It's likely a matter of what software stack you have on
the client side, not which server you're using exactly.

                        regards, tom lane
Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes:
> So you meant to say registered trademark: ®
> is not a valid UTF-8 character?

I'm sure that there is such a Unicode character, but the way you
are presenting it to the database is not UTF-8.  It's some other
character encoding, probably a single-byte encoding such as a
member of the ISO 8859 family [1].  I see in the table there
that code 0xAE is the trademark symbol in 8859-1 (LATIN1) and
some but not all of the other variants.  You need to arrange
for the proper encoding conversion to happen.  Perhaps reading [2]
would help.

            regards, tom lane

[1] https://en.wikipedia.org/wiki/ISO/IEC_8859
[2] https://www.postgresql.org/docs/current/multibyte.html