Thread: Using multi-locale support in glibc
Browsing the glibc stuff for locales I noticed that glibc does actually allow you to specify the collation order to strcoll and friends. The feature is however marked with: Attention: all these functions are *not* standardized in any form. This is a proof-of-concept implementation. They do however work fine. I used my taggedtypes module to create a type that binds the collation order to the text strings and the results can be seen below. 1. Is something supported by glibc usable for us (re portability to non-glibc platforms)? 2. Should we be trying to use an interface that's specifically marked as unstable? 3. What's the plan to support multiple collate orders? There was a message about it last year but I don't see much progress. 4. It makes some things more difficult. For example, my database is UNICODE and until I specified a UTF8 locale it didn't come out right. AFAIK the only easy way to determine if something is UTF8 compatable is to use locale -k charmap. The C interface is hidden. It should be possible to compile a list of locales and allow only ones matching the database. Or automatically convert the strings, the conversion functions exist. 5. Maybe we should evaluate the interface and give feedback to the glibc developers to see if it can be made more stable. If you want to have a look to see what's available, use: rgrep -3 locale_t /usr/include/ |less Have a nice day, PS. The code to test this can be found at: http://svana.org/kleptog/pgsql/taggedtypes.html --- TEST OUTPUT --- test=# select strings from taggedtypes.locale_test order by locale_text( strings, 'C' );strings ---------Test2Tést1Tëst1test1tèst2 (5 rows) test=# select strings from taggedtypes.locale_test order by locale_text( strings, 'en_US' );strings ---------Tëst1Tést1tèst2test1Test2 (5 rows) test=# select strings from taggedtypes.locale_test order by locale_text( strings, 'nl_NL' ); ERROR: Locale 'nl_NL' not supported by library test=# select strings from taggedtypes.locale_test order by locale_text( strings, 'en_AU.UTF-8' );strings ---------test1Tést1Tëst1Test2tèst2 (5 rows) -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > 1. Is something supported by glibc usable for us (re portability to > non-glibc platforms)? Nope. Sorry. regards, tom lane
On Thu, Sep 01, 2005 at 01:46:00PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > 1. Is something supported by glibc usable for us (re portability to > > non-glibc platforms)? > > Nope. Sorry. Do we have some platforms that don't have any multi-language support? I mean, we don't have a complete thread library but a wrapper around the ones used on the platform. Couldn't we make a similar wrapper that used glibc if it was available, windows native if it's available, etc... That way we conform to the platform rather than a version of the unicode collating set that postgresql happens to ship with it. For example, Windows doesn't use standard Unicode sorting rules, do we care if people come complaining that postgresql sorts different from their app? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > Do we have some platforms that don't have any multi-language support? I > mean, we don't have a complete thread library but a wrapper around the > ones used on the platform. Couldn't we make a similar wrapper that used > glibc if it was available, windows native if it's available, etc... > That way we conform to the platform rather than a version of the > unicode collating set that postgresql happens to ship with it. That seems likely to be the worst of all possible worlds :-(. As to the first point, our problem with the standard locale support is that (a) it doesn't conveniently/cheaply support use of multiple locales per program, and (b) it fails to expose (portably) information that we need such as the character set assumed by a locale setting. A wrapper around that might hide the convenience problem, but not the performance problem and definitely not the hidden-information problem. As to the second point, our experience with similar issues in the timezone library says that platform-dependent behavior is the last thing we want. I think we're going to end up doing just what we did with timezones, ie, create our own library --- hopefully based on someone else's work rather than rolled from scratch, but we'll feel free to whack the API around until we like it. No one's quite had the stomach to do that yet though ... in part I suppose we're hoping a good library will drop into our laps. (The reason thread support is a poor analogy is that we don't actually care about threads; we only support them to the extent the platform wants us to. The requirements for locale and timezones are driven in the other direction, ie, we need more than most platforms are willing to give.) regards, tom lane