FW: Win32 unicode vs ICU - Mailing list pgsql-patches
From | Magnus Hagander |
---|---|
Subject | FW: Win32 unicode vs ICU |
Date | |
Msg-id | 6BCB9D8A16AC4241919521715F4D8BCE094656@algol.sollentuna.se Whole thread Raw |
Responses |
Re: FW: Win32 unicode vs ICU
Re: FW: Win32 unicode vs ICU Re: FW: Win32 unicode vs ICU |
List | pgsql-patches |
I just realised this mail didn't go through. Probably because it was too large for -hackers. So: repost to -patches. Sorry about that. If it's a duplicate, even more sorry, but I couldn't find it in the archives. (This may explain that nobody answered me :P) //Magnus > -----Original Message----- > From: Magnus Hagander > Sent: Sunday, July 31, 2005 2:09 PM > To: PostgreSQL-development > Cc: pgsql-hackers-win32@postgresql.org > Subject: Win32 unicode vs ICU > > Hi! > > I've been working with Palles ICU patch to make it work on > win32, and I believe I have it done. While doing it I noticed > that ICU basically converts to UTF16 and back - I previously > thought it worked on UTF8 strings. Based on this I also tried > out an implementation for the win32-unicode problem that does > *not* require ICU. It uses the win32 native functions to map > to utf16 and back, and then to process the text there. And I > got through with much less code than the ICU version, while > doing the same thing. > > I am unsure of how to proceed. As I see it there are three paths: > 1) Use native win32 functionality only on win32 > 2) Use ICU functionality only on win32 > 3) Allow both ICU and native functionality, compile time > switch --with-icu (same as unix with the ICU patch) > > > The main downsides of ICU vs the native ones are: > * ICU does not accept win32 locale names. When doing > setlocale("sv_se"), for example, win32 will return this in > later calls as "Swedish_Sweden.1252". To get around this in > the ICU patch, I had to implement a lookup map that converts > it back to sv_se for ICU. > > * ICU is yet another build and runtime dependency, and a > large one (comes in at 11Mb for the DLL files alone in the > win32 download) > > > I guess that the main upside of it is that we'd get > constistent behaviour - in case there are issues with either > ICU or win32 native they'd otherwise differ. And only one new > codepath. But we already live with the platform-inconsistency today... > > Another upside is that it handles more encodings in ICU - my > native implementation does *only* UTF8 and relies on existing > functionality to deal with other encodings. It could of > course be extended if necessary, but from what I can tell > UTF8 is the big one. > > > > I have attached both patches. For the native version, only > win32_utf8.patch is required. For the ICU version, > icu_win32.patch is needed and also the files > localemap.c,localemap.pl, iso639 and iso3166 needs to go in > src/backend/port/win32. (the localemap needs to be updated to > do a better-than-linear search, but I wanted to include an example) > > > Thoughts on the options? > > > And anohter question - my native patch touches the same > functions as the ICU patch. Can somebody who knows the > internals confirm or deny that these are all the required > locations, or do we need to modify more? > > (I have run simple tests in swedish locale and both behave > the same and correct, but I'm unsure of exactly how much > would be affected) > > Finally, the win32 patch also changes the normal path to use > strncoll(). The comment above the function states that we'd > like to use strncoll but it's not available. Well, on win32 > it is, so it should provide a speedup on win32. It is > currently not included in the ICU patch, but should probably > be included whichever path we'd chose. > > > //Magnus >
Attachment
pgsql-patches by date: