Re: Unicode support - Mailing list pgsql-odbc
From | Dave Page |
---|---|
Subject | Re: Unicode support |
Date | |
Msg-id | E7F85A1B5FF8D44C8A1AF6885BC9A0E4AC9E13@ratbert.vale-housing.co.uk Whole thread Raw |
In response to | Unicode support ("Dave Page" <dpage@vale-housing.co.uk>) |
List | pgsql-odbc |
> -----Original Message----- > From: pgsql-odbc-owner@postgresql.org > [mailto:pgsql-odbc-owner@postgresql.org] On Behalf Of Marko Ristola > Sent: 01 September 2005 18:21 > Cc: Hiroshi Saito; Anoop Kumar; pgsql-odbc@postgresql.org > Subject: Re: [ODBC] Unicode support > > > Hi all. Hi Marko, > How about creating a charset conversion interface > and taking UTF-8 as an internal format for ODBC?: > <snip> > > So, there would be a single internal UTF-8 format inside PsqlODBC. > The backend could always deliver UTF-8, so the need for internal > format <-> backend format layer is not needed. > > This implementation would be easy to implement. This is what already happens (if you ignore my recent experimental patch). If the connection is made using one of the *W connect functions, then the ConnectionClass->unicode flag is set to true, and SET client_encoding = 'UTF-8' is sent to the backend. From then on, data going out to the client is fed through utf8_to_ucs2_lf() *if * the data type is specified as SQL_C_WCHAR, and data coming in to *W functions is fed through ucs2_to_utf8(). Afaict, Unicode mode works exactly as it should. If the connection is made using a non-wide function, the ConnectionClass->unicode is not set. In this case, the client is expected to continue using non-wide functions, and the client encoding left at default. In this case, the driver will never report data types as SQL_C_WCHAR. This, is where I believe the major problem occurs - if the ODBC Driver Manager sees that SQLConnectW (iirc) exists, it will automatically map ANSI calls (eg. SQLConnect()) to Unicode (eg. SQLConnectW()). This then causes the driver to report text/char columns as SQL_C_WCHAR. Less well written apps then fall over because they aren't clever enough to request data as SQL_C_CHAR instead of SQL_C_WCHAR. My recent experimental patch aims to address this, by forcing the driver to report SQL_C_CHAR instead of SQL_C_WCHAR for non-unicode databases. This should (and seems to, with minor side effects yet to be fully investigated) fix the BDE problem. As for multibyte (non-unicode) data such as Hiroshi's, my understanding is that in the presence of a Unicode driver, apps are expected to use Unicode (and in fact, are forced to by the driver manager's mapping of ANSI function calls to Unicode calls). Anoop, do you or any of your guys (or anyone else) know unicode/multibyte/encoding well? I'm learning as I go at the moment, so some more experienced help would be *really* appreciated. Regards, Dave.
pgsql-odbc by date: