Re: Unicode support - Mailing list pgsql-odbc

From Marko Ristola
Subject Re: Unicode support
Date
Msg-id 43194F58.6050909@kolumbus.fi
Whole thread Raw
In response to Re: Unicode support  ("Hiroshi Saito" <saito@inetrt.skcapi.co.jp>)
Responses Re: Unicode support
List pgsql-odbc
So, I don't have much experience with Windows ODBC. That's true.
Is it possible to compile psqlodbc with MinGW tools for Windows?

After using Google, I found out, that GLIB libraries are able to convert
UTF-8 into multibyte under Windows. Windows should be
able to convert UTF-8 into Multibyte and vice versa with it's character
set conversion
functions.

After using Google, I found out, that Windows XP had a problem with
Korean multibyte:


  "Windows XP Device Driver Does Not Convert Multibyte Data to Korean"

 Article ID: 817522.

That was fixed in Service Pack 2.

So I ask you, how you have thought about these things:

If I have understood Windows correctly, it uses UCS-2 as it's internal
UNICODE
character set. Linux prefers into UTF-8. So, If we classify UCS-2 and
UTF-8 equal inside psqoldbc,
that makes sense. That's what has been implemented into psqlodbc already
for Windows.

Then there is the world before Unicode existed. There were DOS codepages,
character sets for groups of countries and Multibyte character sets.

JIS X 0208 is a character set (see man 7 charsets).
Shift_JIS is an encoding that can contain JIS X 0208 multibyte
characters (see man 7 charsets).

So it seems, that one working implementation can be done by using UTF-8
PostgreSQL server
and UTF-8 to multibyte conversions.

However, according to Samba team's UNICODE problem descriptions,
there are some problems: UTF-8 to EUC_JP conversion may be different
on Linux and Windows, and on different conversion library implementations.

Some multibyte character sets are contraditory with each other.

If we drop the *W() functions away, we might get a working implementation,
but we might not support the full ODBC API?

So if and only if one single conversion library does the conversions, it
works.

So if and only if the PostgreSQL backend, or only the PSQLODBC side
does the needed conversions, psqlodbc should work with multibyte
encodings, with UTF-8. If the PostgreSQL Server is in a same kind of
Windows environment than the clients, it should work
fully with UTF-8 and the multibyte character sets. This should be the
best working option.

Windows does have a working UCS-2 to multibyte conversion implementation
on the psqlodbc client (since Service Pack 2).

Unfortunately pg_dump + restore from SJIS into UTF-8 might not work,
because Linux's ICONV might not do the conversion correctly.

The conversion into UTF-8 must be done using fully working Windows
conversion functions.
So one way might be something like using such pg_dump under Windows,
that does the multibyte into UTF-8 conversion in Windows side.

How about the following implementation:
ODBC against the backend:
- Backend has multibyte characters.
- Windows uses multibyte characters.
psqlodbc has UTF-8 as it's internal formats.

=> A fully working implementation:
- Backend deliveres multibyte characters.
PSQLODBC converts them into UTF-8.
PSQLODBC deliveres multibyte characters to the client
using utf8_to_locale Windows functions, when necessary.

So the solution might be here to do all conversions on the client side!
However the reasoning for this is, that two separate conversion
libraries might
be contradictory with each other, at least with the Asian character sets.
(With MACs, UTF-8 implementation differs from the standard.)

Or then Asian users should move and use UTF-8 as their PostgreSQL
Server's backend format.
That's the other solution for the same problem. Then PostgreSQL Server
doesn't
have to do the conversion.

It does not seem possible to do all the conversion functions inside
PostgreSQL Server under Windows,
because of the xx() -> xxW() mapping inside Windows ODBC manager. We
can't control that.

What do you think about these thoughts?

Marko Ristola

Hiroshi Saito wrote:

>Hi Dave.
>
>I tried your patch by SJIS of Japan. It seems that it needs some additional
>correction. Moreover, it is necessary to make the driver different from
>UNICODE (WideCharacter). It seems that I have to catch up further.
>
>BTW, I remembered the discussion original by pgAdminIII. I said that I
>should support MullutiByte then. However, How is it now? It is very wonderful.
>I feel that that there are many choices of a character code complicates a problem
>more. but, it is although external environment is different.
>
>Regards,
>Hiroshi Saito
>
>------------------------------------------------------------------------
>
>--- convert.c.orig    Thu Aug  4 21:26:57 2005
>+++ convert.c    Thu Sep  1 04:38:45 2005
>@@ -762,7 +762,7 @@
>                 {
>                     BOOL lf_conv = conn->connInfo.lf_conversion;
>
>-                    if (fCType == SQL_C_WCHAR)
>+                    if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR))
>                     {
>                         len = utf8_to_ucs2_lf(neut_str, -1, lf_conv, NULL, 0);
>                         len *= WCLEN;
>@@ -778,7 +778,7 @@
>                     }
>                     else
> #ifdef    WIN32
>-                    if (fCType == SQL_C_CHAR)
>+                    if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_CHAR))
>                     {
>                         wstrlen = utf8_to_ucs2_lf(neut_str, -1, lf_conv, NULL, 0);
>                         allocbuf = (SQLWCHAR *) malloc(WCLEN * (wstrlen + 1));
>@@ -810,7 +810,7 @@
>                             pgdc->ttlbuflen = len + 1;
>                         }
>
>-                        if (fCType == SQL_C_WCHAR)
>+                        if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR))
>                         {
>                             utf8_to_ucs2_lf(neut_str, -1, lf_conv, (SQLWCHAR *) pgdc->ttlbuf, len / WCLEN);
>                         }
>@@ -824,7 +824,7 @@
>                         }
>                         else
> #ifdef    WIN32
>-                        if (fCType == SQL_C_CHAR)
>+                        if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_CHAR))
>                         {
>                             len = WideCharToMultiByte(CP_ACP, 0, allocbuf, wstrlen, pgdc->ttlbuf, pgdc->ttlbuflen,
NULL,NULL); 
>                             free(allocbuf);
>@@ -871,7 +871,7 @@
>
>                     copy_len = (len >= cbValueMax) ? cbValueMax - 1 : len;
>
>-                    if (fCType == SQL_C_WCHAR)
>+                    if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR))
>                     {
>                         copy_len /= WCLEN;
>                         copy_len *= WCLEN;
>@@ -911,7 +911,7 @@
>                         memcpy(rgbValueBindRow, ptr, copy_len);
>                         /* Add null terminator */
>
>-                        if (fCType == SQL_C_WCHAR)
>+                        if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR))
>                             memset(rgbValueBindRow + copy_len, 0, WCLEN);
>                         else
>
>@@ -942,7 +942,7 @@
>                 break;
>         }
>
>-        if (SQL_C_WCHAR == fCType && ! wchanged)
>+        if ((conn->unicode && conn->report_wide_types) && (SQL_C_WCHAR == fCType && ! wchanged))
>         {
>             if (cbValueMax > (SDWORD) (WCLEN * (len + 1)))
>             {
>@@ -2629,6 +2629,8 @@
>                 case SQL_WCHAR:
>                 case SQL_WVARCHAR:
>                 case SQL_WLONGVARCHAR:
>+                    if (conn->unicode && conn->report_wide_types)
>+                    {
>                     if (SQL_NTS == used)
>                         used = strlen(buffer);
>                     allocbuf = malloc(WCLEN * (used + 1));
>@@ -2637,6 +2639,11 @@
>                     buf = ucs2_to_utf8((SQLWCHAR *) allocbuf, used, (UInt4 *) &used, FALSE);
>                     free(allocbuf);
>                     allocbuf = buf;
>+                    {
>+                    else
>+                    {
>+                        buf = buffer;
>+                    }
>                     break;
>                 default:
>                     buf = buffer;
>@@ -2647,10 +2654,17 @@
>             break;
>
>         case SQL_C_WCHAR:
>+            if (conn->unicode && conn->report_wide_types)
>+            {
>             if (SQL_NTS == used)
>                 used = WCLEN * wcslen((SQLWCHAR *) buffer);
>             buf = allocbuf = ucs2_to_utf8((SQLWCHAR *) buffer, used / WCLEN, (UInt4 *) &used, FALSE);
>             used *= WCLEN;
>+            }
>+            else
>+            {
>+                buf = buffer;
>+            }
>             break;
>
>         case SQL_C_DOUBLE:
>--- psqlodbc_win32.def.orig    Thu Sep  1 04:41:37 2005
>+++ psqlodbc_win32.def    Thu Sep  1 04:42:08 2005
>@@ -78,31 +78,3 @@
> DllMain @201
> ConfigDSN @202
>
>-SQLColAttributeW    @101
>-SQLColumnPrivilegesW    @102
>-SQLColumnsW        @103
>-SQLConnectW        @104
>-SQLDescribeColW        @106
>-SQLExecDirectW        @107
>-SQLForeignKeysW        @108
>-SQLGetConnectAttrW    @109
>-SQLGetCursorNameW    @110
>-SQLGetInfoW        @111
>-SQLNativeSqlW        @112
>-SQLPrepareW        @113
>-SQLPrimaryKeysW        @114
>-SQLProcedureColumnsW    @115
>-SQLProceduresW        @116
>-SQLSetConnectAttrW    @117
>-SQLSetCursorNameW    @118
>-SQLSpecialColumnsW    @119
>-SQLStatisticsW        @120
>-SQLTablesW        @121
>-SQLTablePrivilegesW    @122
>-SQLDriverConnectW    @123
>-SQLGetDiagRecW        @124
>-SQLGetStmtAttrW        @125
>-SQLSetStmtAttrW        @126
>-SQLSetDescFieldW    @127
>-SQLGetTypeInfoW        @128
>-SQLGetDiagFieldW    @129
>
>
>------------------------------------------------------------------------
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>
>


pgsql-odbc by date:

Previous
From: Matthias Weinert
Date:
Subject: Re: c++ mfc: problem with bytea
Next
From: Marko Ristola
Date:
Subject: Re: Unicode support