Thread: ANSI and Unicode driver

ANSI and Unicode driver

From

Peter Eisentraut

Date:

20 February 2006, 12:05:09

So really, what is the difference between the ANSI and the Unicode driver?
The Unicode driver sets the client encoding to UTF-8, but does that mean that
the client application has to use UTF-8 or does the driver manager convert
that?  What do you use if you have, say, a Chinese application.  Or a Latin 1
application but a UTF-8 database?  How does this work?  I'm confused.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: ANSI and Unicode driver

From

"Dave Page"

Date:

20 February 2006, 12:52:25


> -----Original Message-----
> From: pgsql-odbc-owner@postgresql.org
> [mailto:pgsql-odbc-owner@postgresql.org] On Behalf Of Peter Eisentraut
> Sent: 20 February 2006 16:05
> To: pgsql-odbc@postgresql.org
> Subject: [ODBC] ANSI and Unicode driver
>
> So really, what is the difference between the ANSI and the
> Unicode driver?
> The Unicode driver sets the client encoding to UTF-8, but
> does that mean that
> the client application has to use UTF-8 or does the driver
> manager convert
> that?  What do you use if you have, say, a Chinese
> application.  Or a Latin 1
> application but a UTF-8 database?  How does this work?  I'm confused.

The Unicode driver adds a bunch of unicode-specific APIs. Our ANSI
driver can handle Unicode data as multibyte strings as well, but without
the Unicode APIs that many non-multibyte aware versions of Windows
require (if that makes sense!).

Regards, Dave.

Re: ANSI and Unicode driver

From

Hiroshi Inoue

Date:

20 February 2006, 14:08:43

Peter Eisentraut wrote:

>So really, what is the difference between the ANSI and the Unicode driver?
>

There are 2 kind of applications, Unicode applications and ANSI
applications.
Unicode applications uses UCS-2(4) encoding and call Unicode ODBC APIs.

>
>The Unicode driver sets the client encoding to UTF-8, but does that mean that
>the client application has to use UTF-8
>

Though Unicode applications are preferable for Unicode drivers,

>or does the driver manager convert
>that?
>

wise driver managers may invoke ANSI <-> UCS-2(4) conversions when ANSI
applications call ANSI ODBC APIs for the Unicode driver.

regards,
Hiroshi Inoue

Re: ANSI and Unicode driver

From

Peter Eisentraut

Date:

20 February 2006, 14:18:12

Dave Page wrote:
> The Unicode driver adds a bunch of unicode-specific APIs. Our ANSI
> driver can handle Unicode data as multibyte strings as well, but
> without the Unicode APIs that many non-multibyte aware versions of
> Windows require (if that makes sense!).

Well, the information available to me seems to indicate that Unicode
drivers will handle ANSI applications just fine, and of course our
Unicode driver will also handle any server encoding, so the question is
why the ANSI version needs to exist.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: ANSI and Unicode driver

From

Dave Page

Date:

20 February 2006, 16:07:42

On 20/2/06 18:18, "Peter Eisentraut" <peter_e@gmx.net> wrote:

> Dave Page wrote:
>> The Unicode driver adds a bunch of unicode-specific APIs. Our ANSI
>> driver can handle Unicode data as multibyte strings as well, but
>> without the Unicode APIs that many non-multibyte aware versions of
>> Windows require (if that makes sense!).
>
> Well, the information available to me seems to indicate that Unicode
> drivers will handle ANSI applications just fine, and of course our
> Unicode driver will also handle any server encoding,

Yes,that's quite correct.

> so the question is
> why the ANSI version needs to exist.

Well, for 8.0 we did release only 1 driver for that reason, but we kept
getting odd reports that it didn't handle non-ASCII characters (with umlauts
or accents etc) properly in some situations, that others, including myself
could never reproduce.

After lots of on-list discussion in the latter part of last year during
which I made it clear on a number of occasions that we needed help from
someone who understood encodings better than I, we eventually concluded that
the best solution was to reinstate the old ansi driver, as it appeared to my
limited understanding that whilst basic ASCII characters mapped directly
into the lower bytes of the Unicode function function parameters, we needed
some conversion code to cope with other characters (which the SQL Server
driver appears to have as a configurable options for example).

Reinstating the ANSI only driver fixed things instantly for all those that
were complaining BTW, though I did note an email from Hiroshi Inoue earlier
today implying that it's actually the 'wise' DM that handles the conversion
(although the complainants were all on Windows iirc). FWIW, the
07_03_ENHANCED branch does only build the Unicode driver, though I'm not yet
sure if it will suffer the same problems that 08.00 did.

Feel free to explain what exactly is wrong if you know :-)

Regards, Dave.

Re: ANSI and Unicode driver

From

Marc Herbert

Date:

21 February 2006, 05:41:47

Dave Page <dpage@vale-housing.co.uk> writes:

> Reinstating the ANSI only driver fixed things instantly for all those that
> were complaining BTW, though I did note an email from Hiroshi Inoue earlier
> today implying that it's actually the 'wise' DM that handles the conversion
> (although the complainants were all on Windows iirc). FWIW, the
> 07_03_ENHANCED branch does only build the Unicode driver, though I'm not yet
> sure if it will suffer the same problems that 08.00 did.

By the way being a "wise" DM that handle conversions is tricky. Think
for instance about ODBC entry points that may return many different
data types, including strings. See for instance this bug in unixODBC:
 <http://comments.gmane.org/gmane.comp.db.unixodbc.devel/1760>

Related "DOC: Explanation of Length Arguments for Unicode ODBC Functions"
<http://support.microsoft.com/default.aspx?scid=kb;EN-US;q294169>

Re: ANSI and Unicode driver

From

Marc Herbert

Date:

21 February 2006, 05:49:02

Hiroshi Inoue <inoue@tpf.co.jp> writes:

> Peter Eisentraut wrote:
>
>>So really, what is the difference between the ANSI and the Unicode driver?

"ANSI" actually just means "8 bits" in MS-speak.
 <http://blogs.msdn.com/oldnewthing/archive/2004/05/31/144893.aspx>

And "Unicode" actually means UCS-2/UTF-16.

Things started to become clearer for me once I found the
translation...

> There are 2 kind of applications, Unicode applications and ANSI
> applications.
> Unicode applications uses UCS-2(4) encoding and call Unicode ODBC APIs.

Has anyone already seen some real 4-bytes/UCS-4 ODBC applications
running out there, or only 2-bytes/UCS-2/UTF-16 applications like
Microsoft implies through all its documentations?