Thread: Non-ASCII DSN name troubles

Non-ASCII DSN name troubles

From

Heikki Linnakangas

Date:

21 June 2014, 17:45:19

Hi,

If you try to create a data source with a name that contains non-ASCII
characters, funny things will happen. I wouldn't expect the ANSI driver
to support that, but a Unicode driver ought to handle it.

1. We always use the ANSI versions of the functions to read/write the
config, SQLGetPrivateProfileString/SQLWritePrivateProfileString. In the
Unicode driver, I think we should be using the Unicode *W variants of
those functions, otherwise we cannot handle characters that don't have a
representation in the current system codepage.

2. Even if all the characters can be represented in the system codepage,
when built as a Unicode driver, we internally pass all strings as UTF-8
encoded char[] arrays, and convert between UTF-8 and UCS-2 in the
wrapper functions in odbcapiw.c. We also do that for the DSN name in
SQLDriverConnextW(), but we pass the UTF-8 encoded DSN name to
SQLGetPrivateProfileString() function, to get the config options. That
doesn't work, because SQLGetPrivateStringProfileString() expect the
string to be encoded in the system codepage, not UTF-8. Again, we should
be using the Unicode version, SQLGetPrivateProfileStringW().

3. We don't use the Unicode versions of the GUI functions, like
GetDlgTextItem(), when dealing with the configuration dialog. That again
means that the GUI cannot handle characters outside the system codepage,
but we also don't convert the strings to UTF-8 like we do to strings
coming through SQLDriverConnectW() and other API functions, so there's
another mismatch.

Attached patch fixes those issues, allowing you to create a use any
Unicode characters in the DSN name, or any other configuration fields,
with the Unicode driver.


This changes the behavior of how username and password are handled in
the Unicode driver. Without this patch, the username is read from the
registry in the system codepage, and also sent as such to the server.
After the patch, it's always sent to the server in UTF-8. I think that's
more sane behavior, but there's a small chance of breaking existing
installation that depend on the old behavior. So we probably should
include this patch when we bump the major version number to 9.4.

- Heikki

Attachment

0001-Handle-Unicode-correctly-when-reading-writing-DSN-pr.patch

Re: Non-ASCII DSN name troubles

From

"Inoue, Hiroshi"

Date:

23 June 2014, 23:58:23


(2014/06/21 20:37), Heikki Linnakangas wrote:
> Hi,
>
> If you try to create a data source with a name that contains non-ASCII
> characters, funny things will happen. I wouldn't expect the ANSI driver
> to support that, but a Unicode driver ought to handle it.

Currently NON-ascii characters are not recommended because they are
mainly used at connection time. Though Unicode version SQLDriverConnect
uses UTF-8 encoded user, password, database ... because I don't think of
other ways, it has little meaning IMHO. Was there a decision that
the encoding of user, password or database is utf-8?

> 1. We always use the ANSI versions of the functions to read/write the
> config, SQLGetPrivateProfileString/SQLWritePrivateProfileString. In the
> Unicode driver, I think we should be using the Unicode *W variants of
> those functions, otherwise we cannot handle characters that don't have a
> representation in the current system codepage.
>
> 2. Even if all the characters can be represented in the system codepage,
> when built as a Unicode driver, we internally pass all strings as UTF-8
> encoded char[] arrays, and convert between UTF-8 and UCS-2 in the
> wrapper functions in odbcapiw.c. We also do that for the DSN name in
> SQLDriverConnextW(), but we pass the UTF-8 encoded DSN name to
> SQLGetPrivateProfileString() function, to get the config options. That
> doesn't work, because SQLGetPrivateStringProfileString() expect the
> string to be encoded in the system codepage, not UTF-8. Again, we should
> be using the Unicode version, SQLGetPrivateProfileStringW().
>
> 3. We don't use the Unicode versions of the GUI functions, like
> GetDlgTextItem(), when dealing with the configuration dialog. That again
> means that the GUI cannot handle characters outside the system codepage,
> but we also don't convert the strings to UTF-8 like we do to strings
> coming through SQLDriverConnectW() and other API functions, so there's
> another mismatch.
>
> Attached patch fixes those issues, allowing you to create a use any
> Unicode characters in the DSN name, or any other configuration fields,
> with the Unicode driver.
>
>
> This changes the behavior of how username and password are handled in
> the Unicode driver. Without this patch, the username is read from the
> registry in the system codepage, and also sent as such to the server.
> After the patch, it's always sent to the server in UTF-8. I think that's
> more sane behavior, but there's a small chance of breaking existing
> installation that depend on the old behavior. So we probably should
> include this patch when we bump the major version number to 9.4.
>
> - Heikki
>
>
>

Re: Non-ASCII DSN name troubles

From

Heikki Linnakangas

Date:

24 June 2014, 09:57:57

On 06/23/2014 11:58 PM, Inoue, Hiroshi wrote:
> (2014/06/21 20:37), Heikki Linnakangas wrote:
>> If you try to create a data source with a name that contains non-ASCII
>> characters, funny things will happen. I wouldn't expect the ANSI driver
>> to support that, but a Unicode driver ought to handle it.
>
> Currently NON-ascii characters are not recommended because they are
> mainly used at connection time.

Note that the DSN name is never sent to the server. Even if we conclude
that we want to keep the behavior of username, password and database as
is, we should still allow the DSN name to contain any characters.

> Though Unicode version SQLDriverConnect
> uses UTF-8 encoded user, password, database ... because I don't think of
> other ways, it has little meaning IMHO. Was there a decision that
> the encoding of user, password or database is utf-8?

Not sure what you mean. There has been no changes in the server around
this. The server just treats the username, password and database as raw
bytes. Which is unfortunate, but we'll just have to deal with it in the
driver.

The question is, what encoding should we use to send the username,
password and database to the server?

1. Current behavior: The username, password and database are encoded
using the current Windows ANSI codepage. If there are characters that
cannot be encoded using the ANSI codepage, Windows will replace them with ?.

2. Behavior with the patch: The username, password and database are
always encoded using UTF-8, when using the Unicode driver.

Both behaviors have pros and cons. If you assume that the server uses
UTF-8, and the client uses the Unicode driver and is fully
Unicode-enabled, then the patched behavior is clearly better. With the
current behavior, if e.g the username contains any non-ASCII characters,
you cannot connect.

But if you assume that the server is not using UTF-8, but LATIN1 for
example, and the client uses the Unicode driver, then the current
behavior is better. It will allow the client to connect, assuming that
the Windows ANSI codepage is set to LATIN1, while with the patch it will
not work. However, if the server and client both use LATIN1 rather than
Unicode/UTF-8, then you probably should be using the ANSI driver instead.

Overall, I think the patched behavior is better.

If we want to make it really flexible, we could add a new parameter to
explicitly specify the encoding used for username, password and
database. Then you could connect to any database with the Unicode
driver, as long as you set the parameter correctly.

- Heikki