Re: Collation rules and multi-lingual databases - Mailing list pgsql-general
From | Dennis Gearon |
---|---|
Subject | Re: Collation rules and multi-lingual databases |
Date | |
Msg-id | 3F464277.2080304@fireserve.net Whole thread Raw |
In response to | Re: Collation rules and multi-lingual databases (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: Collation rules and multi-lingual databases
(Greg Stark <gsstark@mit.edu>)
|
List | pgsql-general |
I was thinking of INGNORING locale, since it is basically fixed for a DB for long periods of time. If a table/column HAD it's own locale, that could be used, but I was more interested in a function taht would allow the explicit declaration of the encoding(s) to look for. BTW, what is l10n Greg Stark wrote: >Greg Stark <gsstark@MIT.EDU> writes: > > > >>Dennis Gearon <gearond@fireserve.net> writes: >> >> >> >>>I think it would be nice, and I may write it eventually, to have a function >>>called: >>> >>>COLLATION_VALUE( 'string', 'encoding' ) >>> >>> >>Indeed that would be really nice. I wish I had that and a pony. >> >>Unfortunately my understanding is that the collation rules are simply too >>complex to allow such a function in general. It's too bad because it would >>indeed eliminate a lot of the problems in a single swoop. >> >> > >Uh, so apparently I'm on crack and this is *precisely* how the l10n collation >rules work. Sorry for jumping in with an uninformed opinion. > > > >> Effectively, the way these functions work is by applying a mapping to >>transform the characters in a string to a byte sequence that represents >>the string's position in the collating sequence of the current locale. >>Comparing two such byte sequences in a simple fashion is equivalent to >>comparing the strings with the locale's collating sequence. >> >> The functions `strcoll' and `wcscoll' perform this translation >>implicitly, in order to do one comparison. By contrast, `strxfrm' and >>`wcsxfrm' perform the mapping explicitly. If you are making multiple >>comparisons using the same string or set of strings, it is likely to be >>more efficient to use `strxfrm' or `wcsxfrm' to transform all the >>strings just once, and subsequently compare the transformed strings >>with `strcmp' or `wcscmp'. >> >> > >Given this it should be easy to write a collation_value(string,locale) C >function that switches the collation order, calls strxfrm and then restores >the collation order. > >I fear memory leaks or performance losses on frequent locale switches like >this but it should be easy enough to try out. I don't see any problems with >postgres as long as it's possible to ensure the locale is always switched back >properly. It might not be thread-safe though. > >At worst I could always call strxfrm in the application for each locale I care >about when inserting the data. That would bloat my tables for nothing though. > >So it's looking like I might get my pony after all. > > >
pgsql-general by date: