Re: Continuing encoding fun.... - Mailing list pgsql-odbc
From | Marc Herbert |
---|---|
Subject | Re: Continuing encoding fun.... |
Date | |
Msg-id | 87acio7nh8.fsf@meije.emic.fr Whole thread Raw |
In response to | Continuing encoding fun.... ("Dave Page" <dpage@vale-housing.co.uk>) |
Responses |
Re: Continuing encoding fun....
|
List | pgsql-odbc |
"Dave Page" <dpage@vale-housing.co.uk> writes: > I've been thinking about this whilst getting dragged round the shops > today, and having read Marko's, Johann's, Hiroshi's and other emails, > not to mention bits of the ODBC spec, here's where I think we stand. > > 1) The current driver works as expected with Unicode apps. > > 2) 7 bit ASCII apps work correctly. The driver manager maps the ANSI > functions to the Unicode ones, and because (as I think Marko pointed > out) the basic latin chars map directly into the lower Unicode > characters (see http://www.unicode.org/charts/PDF/U0000.pdf). > > 3) Some other single byte LATIN encodings do not work. This is because > the characters do not map directly into Unicode 80-FF > (http://www.unicode.org/charts/PDF/U0080.pdf). > > 4) Multibyte apps do not work. I believe that in fact they never will > with a Unicode driver, because multibyte characters simply won't map > into Unicode in the same way that ASCII does. The user cannot opt to use > the non-wide functions, because the DM automatically maps them to the > Unicode versions. > > Because the Driver Manager forces the user to use the *W functions if > they exist, I cannot see any way to make 3 or 4 work with a Unicode > driver. If we were to try to detect what encoding to use based on the OS > settings and convert on the fly, we would most likely break any apps > that try to do the right thing by using Unicode themselves. In a perfect world there are no "unicode apps", the internal encoding is set by the system, properly written apps use abstract TCHAR/wchar_t characters without knowing anything about what encoding they use, and programs communicating with the outside (such as an database driver), should query the system encoding using something like "setlocale()", and perform any appropriate conversion on the fly. Excerpt from "info libc - Character Set Handling" of GNU libc 2.3.2 <http://www.gnu.org/software/libc/manual/html_node/Character-Set-Handling.html> The question remaining is: how to select the character set or encoding to use. The answer: you cannot decide about it yourself, it is decided by the developers of the system or the majority of the users. Since the goal is interoperability one has to use whatever the other people one works with use. <http://www.faqs.org/docs/Linux-HOWTO/Unicode-HOWTO.html#s6> says the same thing: "Avoid direct access with Unicode. This is a task of the platform's internationalization framework." Of course those two quotes are targeted at applications developers. They imply that some driver communicating with the outside world/database should carry any conversion task. However, I have no idea how this theory is far from reality, far from the ODBC API, and far from Windows, sorry :-( I just was woken up by the "unicode apps" word. I tried to follow the discussions here but got lost. My 2 cents.
pgsql-odbc by date: