Re: Continuing encoding fun.... - Mailing list pgsql-odbc

From Dave Page
Subject Re: Continuing encoding fun....
Date
Msg-id E7F85A1B5FF8D44C8A1AF6885BC9A0E4E7E20E@ratbert.vale-housing.co.uk
Whole thread Raw
In response to Continuing encoding fun....  ("Dave Page" <dpage@vale-housing.co.uk>)
List pgsql-odbc

> -----Original Message-----
> From: pgsql-odbc-owner@postgresql.org
> [mailto:pgsql-odbc-owner@postgresql.org] On Behalf Of Marc Herbert
> Sent: 21 November 2005 17:19
> To: pgsql-odbc@postgresql.org
> Subject: Re: [ODBC] Continuing encoding fun....
>
> "Dave Page" <dpage@vale-housing.co.uk> writes:
>
> > I've been thinking about this whilst getting dragged round the shops
> > today, and having read Marko's, Johann's, Hiroshi's and
> other emails,
> > not to mention bits of the ODBC spec, here's where I think we stand.
> >
> > 1) The current driver works as expected with Unicode apps.
> >
> > 2) 7 bit ASCII apps work correctly. The driver manager maps the ANSI
> > functions to the Unicode ones, and because (as I think Marko pointed
> > out) the basic latin chars map directly into the lower Unicode
> > characters (see http://www.unicode.org/charts/PDF/U0000.pdf).
> >
> > 3) Some other single byte LATIN encodings do not work. This
> is because
> > the characters do not map directly into Unicode 80-FF
> > (http://www.unicode.org/charts/PDF/U0080.pdf).
> >
> > 4) Multibyte apps do not work. I believe that in fact they
> never will
> > with a Unicode driver, because multibyte characters simply won't map
> > into Unicode in the same way that ASCII does. The user
> cannot opt to use
> > the non-wide functions, because the DM automatically maps
> them to the
> > Unicode versions.
> >
> > Because the Driver Manager forces the user to use the *W
> functions if
> > they exist, I cannot see any way to make 3 or 4 work with a Unicode
> > driver.
>
>
> I agree that 4) can never work, because ODBC does not seem compatible
> with multibyte apps by design. ODBC caters for "ANSI" and "Unicode"
> strings, that's all.
> <http://blogs.msdn.com/oldnewthing/archive/2004/05/31/144893.aspx>

Actually our ANSI driver works quite nicely in various non-Unicode multibyte encodings such as Shift-JIS, EUC_CN, JOHAB
andmore. It'll even work with pure UTF-8 in multibyte mode using the ANSI API. 

>
> However, I don't get why 3) does not work. From here:
> <http://msdn.microsoft.com/library/default.asp?url=/library/en
> -us/odbc/htm/odbcunicode_function_arguments.asp>
>
>  If the driver is a Unicode driver, the Driver Manager makes function
>  calls as follows:
>  - Converts an ANSI function (with the A suffix) to a Unicode function
>  (with the W suffix) by converting the string arguments into Unicode
>                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  characters and passes the Unicode function to the driver.
>
>
> Are you saying in 3) that the "converting" underlined above is
> actually just a static cast?!

No, not really a static cast, but a similar effect. Unicode chars 0000-007F are exactly the same as their ASCII
counterparts,as are LATIN1 (0080-00FF). All the DM does is map the single byte values into low bytes of the unicode
charactersand passes them to the Unicode functions. This works just fine for pure ASCII/LATIN1, but not with other
charactersetswhich don't directly map from their single byte values into Unicode. 

> Is this "bug" true for every driver manager out there?

It's not really a bug, but I believe so, yes. It gets corrected by the more advanced drivers though - for example, the
SQLserver driver might see a 'Š' character (8A). It knows the local charset is LATIN4, so it can then rewrite that
characterto 0160, the Unicode equivalent. Our Unicode driver will simply leave it as 8A, which is actually a control
character(VTS - LINE TABULATION SET). 

http://www.unicode.org/roadmaps/bmp/

At least, this is how I understand things :-). Regardless though, the encoding bug reports have all-but stopped now we
ship2 drivers again. 

Regards, Dave.

pgsql-odbc by date:

Previous
From: Marc Herbert
Date:
Subject: Re: Continuing encoding fun....
Next
From: Marc Herbert
Date:
Subject: Re: Continuing encoding fun....