Thread: Re: DataDirect Driver, ExecDirect and UTF-8
Hi,
I wasn’t a member of the mailing list when I sent this, so I’m not sure it actually made it out there.
I apologize if this is a duplicate.
....Ken
On 5/4/09 11:49 AM, "Ken Sell" <ksell@greenplum.com> wrote:
I wasn’t a member of the mailing list when I sent this, so I’m not sure it actually made it out there.
I apologize if this is a duplicate.
....Ken
On 5/4/09 11:49 AM, "Ken Sell" <ksell@greenplum.com> wrote:
Hi,
I’m a the new connectivity developer at GreenPlum. GreenPlum makes a data warehouse DBMS based on PostgreSQL.
I’m working on a problem where a user is attempting to insert a non-ASCII UTF-8 values (i.e. An umlaut over an ‘o’).
The test does an insert via an SQLExecDirectW into a table. The text looks like this:
"insert into t1 values ('ö')"
I’ve built and debugged the postgreSQL driver (version 8.02.0500). It looks like the text makes it through the
Driver Manager (i.e. DataDirect) ok. I see the correct value in SQLExecDirectW in odbcapiw.c, but I
also see the code in SQLExecDirectW call ucs2_to_utf8. ucs2_to_utf8 tries to interpret the value as
UCS2, but the value is UTF-8. The value is corrupted by ucs2_to_utf8.
I also attempted to call SQLExecDirect (i.e. no W), but the DataDirect driver manager tries to convert the
umlaut value to ASCII and calls SQLExecDirectW instead.
Can someone elaborate on the driver’s correct behavior in this situation? If the database is UTF-8 and
The application is UTF-8, should the driver handle this? Does the application (or driver manager) have
to convert the string to UCS2 first?
Thanks,
....Ken
Ken Sell wrote: > Hi, > > I wasn’t a member of the mailing list when I sent this, so I’m not sure > it actually made it out there. > I apologize if this is a duplicate. > > ....Ken > > > On 5/4/09 11:49 AM, "Ken Sell" <ksell@greenplum.com> wrote: > > Hi, > > I’m a the new connectivity developer at GreenPlum. GreenPlum makes a > data warehouse DBMS based on PostgreSQL. > I’m working on a problem where a user is attempting to insert a > non-ASCII UTF-8 values (i.e. An umlaut over an ‘o’). > The test does an insert via an SQLExecDirectW into a table. The > text looks like this: > > "insert into t1 values ('ö')" > > I’ve built and debugged the postgreSQL driver (version 8.02.0500). > It looks like the text makes it through the > Driver Manager (i.e. DataDirect) ok. I see the correct value in > SQLExecDirectW in odbcapiw.c, but I > also see the code in SQLExecDirectW call ucs2_to_utf8. ucs2_to_utf8 > tries to interpret the value as > UCS2, but the value is UTF-8. The value is corrupted by ucs2_to_utf8. Psqlodbc Unicode driver uses UTF-16 encoding and your application uses UTF-8 encoding. Isn't the URL http://media.datadirect.com/download/docs/odbc/allodbc/reference/unicode6.html related to your problem ? regards, Hiroshi Inoue
Hi,
Thanks very much for the information. I think I have it figured out.
Here is my understanding. Feel free to correct me if I have gotten something wrong.
In order for everything to work correctly with a UTF-8 application, a DataDirect driver manager, and psqlodbc you must do the following.
- Use the header files (e.g. sql.h) from the DataDirect driver manager installation.
- Compile the driver with the SQLWCHARSHORT define (e.g. #define SQLWCHARSHORT=1).
This will cause SQLWCHAR to be defined as an unsigned short.
- Compile the application with the SQLWCHARSHORT define (e.g. #define SQLWCHARSHORT=1).
This will cause SQLWCHAR to be defined as an unsigned short.
- Set the DriverUnicodeType data source attribute to 1 (i.e. DriverUnicodeType=1 in your odbc.ini file).
This tells the driver manager that your driver is expecting UTF-16 encoding for all SQLWCHAR
parameters.
- If your database encoding is UTF-8, the descriptor for character columns will be a wide type.
For example, a char(1) column will be SQL_WCHAR.
- If you do something with a sql statement (e.g. Do a SQLExecDirectW(“insert into T1 values (‘a’)),
the driver manager (DataDirect) will convert the string from UTF-8 to UTF-16, pass it to the
the driver as UTF-16. The driver will then translate it back to UTF-8. When the driver is
unicode (i.e your compiled psqlodbc with as a unicode driver), then this translation
in the driver manager happens weather or not you use a wide function. The driver manager
converts both SQLCHAR and SQLWCHAR parameters to UTF-16. It also changes all calls to
non-wide functions to calls to wide functions to the driver (e.g. SQLExecDirect gets changed
to SQLExecDirectW). Note, SQLCHAR and SQLWCHAR strings are not translated from UTF-8
to UTF-16 in the same way. If the call is to a non-wide function, then the translation assumes
that the string contains only ASCII characters. A non-ASCII character will probably get
translated incorrectly. A call to wide functions translates UTF-8 to UTF-16 correctly.
- When using parameters, you must be use the appropriate SQL and C type when describing
the parameters column values. If you say the C type is SQL_C_CHAR, the value is assumed
to be in UTF-8 by the driver for a database encoded in UTF-8. IF you say the C type is
SQL_C_WCHAR, I think the driver will assume the value is in UTF-16 and try to translate
it to UTF-8.
- When using column values, you must use the appropriate SQL C type. If you say the column C
type is SQL_C_WCHAR, then the returned value will be translated to UTF-16 by the driver.
If you say the column C type is SQL_C_CHAR, then the value will be returned as UTF-8.
....Ken
On 5/5/09 11:13 PM, "Hiroshi Inoue" <inoue@tpf.co.jp> wrote:
Thanks very much for the information. I think I have it figured out.
Here is my understanding. Feel free to correct me if I have gotten something wrong.
In order for everything to work correctly with a UTF-8 application, a DataDirect driver manager, and psqlodbc you must do the following.
- Use the header files (e.g. sql.h) from the DataDirect driver manager installation.
- Compile the driver with the SQLWCHARSHORT define (e.g. #define SQLWCHARSHORT=1).
This will cause SQLWCHAR to be defined as an unsigned short.
- Compile the application with the SQLWCHARSHORT define (e.g. #define SQLWCHARSHORT=1).
This will cause SQLWCHAR to be defined as an unsigned short.
- Set the DriverUnicodeType data source attribute to 1 (i.e. DriverUnicodeType=1 in your odbc.ini file).
This tells the driver manager that your driver is expecting UTF-16 encoding for all SQLWCHAR
parameters.
- If your database encoding is UTF-8, the descriptor for character columns will be a wide type.
For example, a char(1) column will be SQL_WCHAR.
- If you do something with a sql statement (e.g. Do a SQLExecDirectW(“insert into T1 values (‘a’)),
the driver manager (DataDirect) will convert the string from UTF-8 to UTF-16, pass it to the
the driver as UTF-16. The driver will then translate it back to UTF-8. When the driver is
unicode (i.e your compiled psqlodbc with as a unicode driver), then this translation
in the driver manager happens weather or not you use a wide function. The driver manager
converts both SQLCHAR and SQLWCHAR parameters to UTF-16. It also changes all calls to
non-wide functions to calls to wide functions to the driver (e.g. SQLExecDirect gets changed
to SQLExecDirectW). Note, SQLCHAR and SQLWCHAR strings are not translated from UTF-8
to UTF-16 in the same way. If the call is to a non-wide function, then the translation assumes
that the string contains only ASCII characters. A non-ASCII character will probably get
translated incorrectly. A call to wide functions translates UTF-8 to UTF-16 correctly.
- When using parameters, you must be use the appropriate SQL and C type when describing
the parameters column values. If you say the C type is SQL_C_CHAR, the value is assumed
to be in UTF-8 by the driver for a database encoded in UTF-8. IF you say the C type is
SQL_C_WCHAR, I think the driver will assume the value is in UTF-16 and try to translate
it to UTF-8.
- When using column values, you must use the appropriate SQL C type. If you say the column C
type is SQL_C_WCHAR, then the returned value will be translated to UTF-16 by the driver.
If you say the column C type is SQL_C_CHAR, then the value will be returned as UTF-8.
....Ken
On 5/5/09 11:13 PM, "Hiroshi Inoue" <inoue@tpf.co.jp> wrote:
Ken Sell wrote:
> Hi,
>
> I wasn’t a member of the mailing list when I sent this, so I’m not sure
> it actually made it out there.
> I apologize if this is a duplicate.
>
> ....Ken
>
>
> On 5/4/09 11:49 AM, "Ken Sell" <ksell@greenplum.com> wrote:
>
> Hi,
>
> I’m a the new connectivity developer at GreenPlum. GreenPlum makes a
> data warehouse DBMS based on PostgreSQL.
> I’m working on a problem where a user is attempting to insert a
> non-ASCII UTF-8 values (i.e. An umlaut over an ‘o’).
> The test does an insert via an SQLExecDirectW into a table. The
> text looks like this:
>
> "insert into t1 values ('ö')"
>
> I’ve built and debugged the postgreSQL driver (version 8.02.0500).
> It looks like the text makes it through the
> Driver Manager (i.e. DataDirect) ok. I see the correct value in
> SQLExecDirectW in odbcapiw.c, but I
> also see the code in SQLExecDirectW call ucs2_to_utf8. ucs2_to_utf8
> tries to interpret the value as
> UCS2, but the value is UTF-8. The value is corrupted by ucs2_to_utf8.
Psqlodbc Unicode driver uses UTF-16 encoding and your application uses
UTF-8 encoding. Isn't the URL
http://media.datadirect.com/download/docs/odbc/allodbc/reference/unicode6.html
related to your problem ?
regards,
Hiroshi Inoue