Re: Unicode support - Mailing list pgsql-odbc
From | Marko Ristola |
---|---|
Subject | Re: Unicode support |
Date | |
Msg-id | 43194F58.6050909@kolumbus.fi Whole thread Raw |
In response to | Re: Unicode support ("Hiroshi Saito" <saito@inetrt.skcapi.co.jp>) |
Responses |
Re: Unicode support
|
List | pgsql-odbc |
So, I don't have much experience with Windows ODBC. That's true. Is it possible to compile psqlodbc with MinGW tools for Windows? After using Google, I found out, that GLIB libraries are able to convert UTF-8 into multibyte under Windows. Windows should be able to convert UTF-8 into Multibyte and vice versa with it's character set conversion functions. After using Google, I found out, that Windows XP had a problem with Korean multibyte: "Windows XP Device Driver Does Not Convert Multibyte Data to Korean" Article ID: 817522. That was fixed in Service Pack 2. So I ask you, how you have thought about these things: If I have understood Windows correctly, it uses UCS-2 as it's internal UNICODE character set. Linux prefers into UTF-8. So, If we classify UCS-2 and UTF-8 equal inside psqoldbc, that makes sense. That's what has been implemented into psqlodbc already for Windows. Then there is the world before Unicode existed. There were DOS codepages, character sets for groups of countries and Multibyte character sets. JIS X 0208 is a character set (see man 7 charsets). Shift_JIS is an encoding that can contain JIS X 0208 multibyte characters (see man 7 charsets). So it seems, that one working implementation can be done by using UTF-8 PostgreSQL server and UTF-8 to multibyte conversions. However, according to Samba team's UNICODE problem descriptions, there are some problems: UTF-8 to EUC_JP conversion may be different on Linux and Windows, and on different conversion library implementations. Some multibyte character sets are contraditory with each other. If we drop the *W() functions away, we might get a working implementation, but we might not support the full ODBC API? So if and only if one single conversion library does the conversions, it works. So if and only if the PostgreSQL backend, or only the PSQLODBC side does the needed conversions, psqlodbc should work with multibyte encodings, with UTF-8. If the PostgreSQL Server is in a same kind of Windows environment than the clients, it should work fully with UTF-8 and the multibyte character sets. This should be the best working option. Windows does have a working UCS-2 to multibyte conversion implementation on the psqlodbc client (since Service Pack 2). Unfortunately pg_dump + restore from SJIS into UTF-8 might not work, because Linux's ICONV might not do the conversion correctly. The conversion into UTF-8 must be done using fully working Windows conversion functions. So one way might be something like using such pg_dump under Windows, that does the multibyte into UTF-8 conversion in Windows side. How about the following implementation: ODBC against the backend: - Backend has multibyte characters. - Windows uses multibyte characters. psqlodbc has UTF-8 as it's internal formats. => A fully working implementation: - Backend deliveres multibyte characters. PSQLODBC converts them into UTF-8. PSQLODBC deliveres multibyte characters to the client using utf8_to_locale Windows functions, when necessary. So the solution might be here to do all conversions on the client side! However the reasoning for this is, that two separate conversion libraries might be contradictory with each other, at least with the Asian character sets. (With MACs, UTF-8 implementation differs from the standard.) Or then Asian users should move and use UTF-8 as their PostgreSQL Server's backend format. That's the other solution for the same problem. Then PostgreSQL Server doesn't have to do the conversion. It does not seem possible to do all the conversion functions inside PostgreSQL Server under Windows, because of the xx() -> xxW() mapping inside Windows ODBC manager. We can't control that. What do you think about these thoughts? Marko Ristola Hiroshi Saito wrote: >Hi Dave. > >I tried your patch by SJIS of Japan. It seems that it needs some additional >correction. Moreover, it is necessary to make the driver different from >UNICODE (WideCharacter). It seems that I have to catch up further. > >BTW, I remembered the discussion original by pgAdminIII. I said that I >should support MullutiByte then. However, How is it now? It is very wonderful. >I feel that that there are many choices of a character code complicates a problem >more. but, it is although external environment is different. > >Regards, >Hiroshi Saito > >------------------------------------------------------------------------ > >--- convert.c.orig Thu Aug 4 21:26:57 2005 >+++ convert.c Thu Sep 1 04:38:45 2005 >@@ -762,7 +762,7 @@ > { > BOOL lf_conv = conn->connInfo.lf_conversion; > >- if (fCType == SQL_C_WCHAR) >+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR)) > { > len = utf8_to_ucs2_lf(neut_str, -1, lf_conv, NULL, 0); > len *= WCLEN; >@@ -778,7 +778,7 @@ > } > else > #ifdef WIN32 >- if (fCType == SQL_C_CHAR) >+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_CHAR)) > { > wstrlen = utf8_to_ucs2_lf(neut_str, -1, lf_conv, NULL, 0); > allocbuf = (SQLWCHAR *) malloc(WCLEN * (wstrlen + 1)); >@@ -810,7 +810,7 @@ > pgdc->ttlbuflen = len + 1; > } > >- if (fCType == SQL_C_WCHAR) >+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR)) > { > utf8_to_ucs2_lf(neut_str, -1, lf_conv, (SQLWCHAR *) pgdc->ttlbuf, len / WCLEN); > } >@@ -824,7 +824,7 @@ > } > else > #ifdef WIN32 >- if (fCType == SQL_C_CHAR) >+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_CHAR)) > { > len = WideCharToMultiByte(CP_ACP, 0, allocbuf, wstrlen, pgdc->ttlbuf, pgdc->ttlbuflen, NULL,NULL); > free(allocbuf); >@@ -871,7 +871,7 @@ > > copy_len = (len >= cbValueMax) ? cbValueMax - 1 : len; > >- if (fCType == SQL_C_WCHAR) >+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR)) > { > copy_len /= WCLEN; > copy_len *= WCLEN; >@@ -911,7 +911,7 @@ > memcpy(rgbValueBindRow, ptr, copy_len); > /* Add null terminator */ > >- if (fCType == SQL_C_WCHAR) >+ if ((conn->unicode && conn->report_wide_types) && (fCType == SQL_C_WCHAR)) > memset(rgbValueBindRow + copy_len, 0, WCLEN); > else > >@@ -942,7 +942,7 @@ > break; > } > >- if (SQL_C_WCHAR == fCType && ! wchanged) >+ if ((conn->unicode && conn->report_wide_types) && (SQL_C_WCHAR == fCType && ! wchanged)) > { > if (cbValueMax > (SDWORD) (WCLEN * (len + 1))) > { >@@ -2629,6 +2629,8 @@ > case SQL_WCHAR: > case SQL_WVARCHAR: > case SQL_WLONGVARCHAR: >+ if (conn->unicode && conn->report_wide_types) >+ { > if (SQL_NTS == used) > used = strlen(buffer); > allocbuf = malloc(WCLEN * (used + 1)); >@@ -2637,6 +2639,11 @@ > buf = ucs2_to_utf8((SQLWCHAR *) allocbuf, used, (UInt4 *) &used, FALSE); > free(allocbuf); > allocbuf = buf; >+ { >+ else >+ { >+ buf = buffer; >+ } > break; > default: > buf = buffer; >@@ -2647,10 +2654,17 @@ > break; > > case SQL_C_WCHAR: >+ if (conn->unicode && conn->report_wide_types) >+ { > if (SQL_NTS == used) > used = WCLEN * wcslen((SQLWCHAR *) buffer); > buf = allocbuf = ucs2_to_utf8((SQLWCHAR *) buffer, used / WCLEN, (UInt4 *) &used, FALSE); > used *= WCLEN; >+ } >+ else >+ { >+ buf = buffer; >+ } > break; > > case SQL_C_DOUBLE: >--- psqlodbc_win32.def.orig Thu Sep 1 04:41:37 2005 >+++ psqlodbc_win32.def Thu Sep 1 04:42:08 2005 >@@ -78,31 +78,3 @@ > DllMain @201 > ConfigDSN @202 > >-SQLColAttributeW @101 >-SQLColumnPrivilegesW @102 >-SQLColumnsW @103 >-SQLConnectW @104 >-SQLDescribeColW @106 >-SQLExecDirectW @107 >-SQLForeignKeysW @108 >-SQLGetConnectAttrW @109 >-SQLGetCursorNameW @110 >-SQLGetInfoW @111 >-SQLNativeSqlW @112 >-SQLPrepareW @113 >-SQLPrimaryKeysW @114 >-SQLProcedureColumnsW @115 >-SQLProceduresW @116 >-SQLSetConnectAttrW @117 >-SQLSetCursorNameW @118 >-SQLSpecialColumnsW @119 >-SQLStatisticsW @120 >-SQLTablesW @121 >-SQLTablePrivilegesW @122 >-SQLDriverConnectW @123 >-SQLGetDiagRecW @124 >-SQLGetStmtAttrW @125 >-SQLSetStmtAttrW @126 >-SQLSetDescFieldW @127 >-SQLGetTypeInfoW @128 >-SQLGetDiagFieldW @129 > > >------------------------------------------------------------------------ > > >---------------------------(end of broadcast)--------------------------- >TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > >
pgsql-odbc by date: