Thread: invalid byte sequence for encoding "UTF8"
I am running into a difference in behavior that I think is related to the psqlODBC driver between Windows and Linux. OnWindows x86 using the Windows ODBC manager and psqlODBC 8.4.2 ANSI driver I have some C++ code that executes the followingSQL statements and inserts into a Windows x86 PostgreSQL 8.4.2 database. CREATE TABLE luke_test(a TEXT); INSERT INTO luke_test (a) VALUES('ý'); INSERT INTO luke_test (a) VALUES('þ'); INSERT INTO luke_test (a) VALUES('ÿ'); SELECT * FROM luke_test; --ý --þ --ÿ DROP TABLE luke_test; Where the value being insered is a single character byte. This seems to work fine in Windows. I am not wanting to treatthe bytes with there 8th bit set as multi-byte characters. However, if I compile and run the same C++ code on Linux x64 using unixODBC 2.2.15pre ODBC manager and psqlODBC 8.4.2 ANSIdriver inserting into the before mentioned Windows x86 PostreSQL 8.4.2 database I get different behavior. CREATE TABLE luke_test(a TEXT); INSERT INTO luke_test (a) VALUES('ý'); Error executing "INSERT INTO luke_test (a) VALUES('ý');": ERROR: invalid byte sequence for encoding "UTF8": 0xfd; Error while executing the query INSERT INTO luke_test (a) VALUES('þ'); Error executing "INSERT INTO luke_test (a) VALUES('þ');": ERROR: invalid byte sequence for encoding "UTF8": 0xfe; Error while executing the query INSERT INTO luke_test (a) VALUES('ÿ'); Error executing "INSERT INTO luke_test (a) VALUES('ÿ');": ERROR: invalid byte sequence for encoding "UTF8": 0xff; Error while executing the query SELECT * FROM luke_test; DROP TABLE luke_test; Which leads me to believe that my environment on Linux is trying to treat the character bytes as multi-byte characters. I have tried SET client_encoding='ANSI_SQL' before I execute the inserts and that does not seem to help the situation. Cananyone help explain the difference in behavior that I am seeing and suggest a workaround that does not involve encodingthe character bytes as UTF8. Luke
luke wrote: > I am running into a difference in behavior that I think is related to the psqlODBC driver between Windows and Linux. OnWindows x86 using the Windows ODBC manager and psqlODBC 8.4.2 ANSI driver I have some C++ code that executes the followingSQL statements and inserts into a Windows x86 PostgreSQL 8.4.2 database. > > CREATE TABLE luke_test(a TEXT); > INSERT INTO luke_test (a) VALUES('ý'); > INSERT INTO luke_test (a) VALUES('þ'); > INSERT INTO luke_test (a) VALUES('ÿ'); > SELECT * FROM luke_test; > --ý > --þ > --ÿ > DROP TABLE luke_test; > > Where the value being insered is a single character byte. This seems to work fine in Windows. I am not wanting to treatthe bytes with there 8th bit set as multi-byte characters. > > However, if I compile and run the same C++ code on Linux x64 using unixODBC 2.2.15pre ODBC manager and psqlODBC 8.4.2 ANSIdriver inserting into the before mentioned Windows x86 PostreSQL 8.4.2 database I get different behavior. > > CREATE TABLE luke_test(a TEXT); > INSERT INTO luke_test (a) VALUES('ý'); > Error executing "INSERT INTO luke_test (a) VALUES('ý');": ERROR: > invalid byte sequence for encoding "UTF8": 0xfd; > Error while executing the query > INSERT INTO luke_test (a) VALUES('þ'); > Error executing "INSERT INTO luke_test (a) VALUES('þ');": ERROR: > invalid byte sequence for encoding "UTF8": 0xfe; > Error while executing the query > INSERT INTO luke_test (a) VALUES('ÿ'); > Error executing "INSERT INTO luke_test (a) VALUES('ÿ');": ERROR: > invalid byte sequence for encoding "UTF8": 0xff; > Error while executing the query > SELECT * FROM luke_test; > DROP TABLE luke_test; > > Which leads me to believe that my environment on Linux is trying to treat the character bytes as multi-byte characters. > I have tried SET client_encoding='ANSI_SQL' before I execute the inserts and that does not seem to help the situation. Please try SET client_encoding='LATIN1' . regards, Hiroshi Inoue > Can anyone help explain the difference in behavior that I am seeing and suggest a workaround that does not involve encoding the character bytes as UTF8.
Hiroshi Inoue wrote: > luke wrote: >> I am running into a difference in behavior that I think is related to >> the psqlODBC driver between Windows and Linux. On Windows x86 using >> the Windows ODBC manager and psqlODBC 8.4.2 ANSI driver I have some >> C++ code that executes the following SQL statements and inserts into >> a Windows x86 PostgreSQL 8.4.2 database. >> >> CREATE TABLE luke_test(a TEXT); >> INSERT INTO luke_test (a) VALUES('ý'); >> INSERT INTO luke_test (a) VALUES('þ'); >> INSERT INTO luke_test (a) VALUES('ÿ'); >> SELECT * FROM luke_test; >> --ý >> --þ >> --ÿ >> DROP TABLE luke_test; >> >> Where the value being insered is a single character byte. This seems >> to work fine in Windows. I am not wanting to treat the bytes with >> there 8th bit set as multi-byte characters. >> >> However, if I compile and run the same C++ code on Linux x64 using >> unixODBC 2.2.15pre ODBC manager and psqlODBC 8.4.2 ANSI driver >> inserting into the before mentioned Windows x86 PostreSQL 8.4.2 >> database I get different behavior. >> >> CREATE TABLE luke_test(a TEXT); >> INSERT INTO luke_test (a) VALUES('ý'); >> Error executing "INSERT INTO luke_test (a) VALUES('ý');": ERROR: >> invalid byte sequence for encoding "UTF8": 0xfd; >> Error while executing the query >> INSERT INTO luke_test (a) VALUES('þ'); >> Error executing "INSERT INTO luke_test (a) VALUES('þ');": ERROR: >> invalid byte sequence for encoding "UTF8": 0xfe; >> Error while executing the query >> INSERT INTO luke_test (a) VALUES('ÿ'); >> Error executing "INSERT INTO luke_test (a) VALUES('ÿ');": ERROR: >> invalid byte sequence for encoding "UTF8": 0xff; >> Error while executing the query >> SELECT * FROM luke_test; >> DROP TABLE luke_test; >> >> Which leads me to believe that my environment on Linux is trying to >> treat the character bytes as multi-byte characters. > > I have tried SET client_encoding='ANSI_SQL' before I execute the > inserts and that does not seem to help the situation. > > Please try SET client_encoding='LATIN1' . That worked! Thank you! > > regards, > Hiroshi Inoue > > > Can anyone help explain the difference in behavior that I am seeing > and suggest a workaround that does not involve encoding the character > bytes as UTF8. > > > > ------------------------------------------------------------------------ > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.733 / Virus Database: 271.1.1/2688 - Release Date: 02/14/10 11:35:00 > >