Re: Sorting Problem - Mailing list pgsql-general
From | Dennis Gearon |
---|---|
Subject | Re: Sorting Problem |
Date | |
Msg-id | 3F3AB0EC.6040307@cvc.net Whole thread Raw |
In response to | Re: Sorting Problem (Kathy zhu <Kathy.zhu@Sun.COM>) |
List | pgsql-general |
From what I've read, the db will ONLY sort on the language it was 'initdb'd on. I guess the 'C' locale should would sortit sort of, in other languages. Kathy zhu wrote: > Ok, thanks for all the discussion followed, vey educational :-)) > But nobody really followed up my question :-( > > For example, you have a table that is displayed in the browser. You want > to let the user do sorting on one or multible columns, including those > which contain localized strings. > > If the db supports sorting for multi-linguages, you can retreive all the > rows and do "ORDER BY" on the columns directly. > > But if db doesn't support that, in Java, you will have to retrieve all > the rows first, sort the objects on multiple columnes listed in the > "ORDER BY" in java. It is much slower than the previous one. > > My problem is that if initdb is done with en_US, if a customer in Japan > request the page with that sorted table, the text is not sorted in a > Japanese one. Then I have a problem. > > Any suggestions on that ?? > > > Dennis Gearon wrote: > >> I agree, mostly. In the case of a database, I would bet that the >> INTERNAL, IN-APPLICATION processing FAAAAAAAAAAARRRRRRRRRRR exceeds >> that of sending and receiving it. >> >> i.e. comparisons, sorts, triggers, indexes, views, functions, >> logging to tables, ordering by, >> grouping, etc. >> >> except backups, restores, logging to text files (these would be >> good in UTF8) >> >> Bruce Momjian wrote: >> >>> I think the question is how often are you passing data around/storing it >>> _in_ your application and how often are you processing it. >>> >>> --------------------------------------------------------------------------- >>> >>> >>> Dennis Gearon wrote: >>> >>>> I agree with all of that except for one caveat: >>>> >>>> all my reading, and just general off the cuff thinking, says >>>> that processing variable width characters SIGNIFICANTLY slows an >>>> application. It seems better to PROCESS fixed width characters >>>> (1,2,4 byte), and TRANSMIT variable width characters (avoiding the >>>> null problem.) >>>> >>>> Gianni Mariani wrote: >>>> >>>> >>>>> Dennis Gearon wrote: >>>>> >>>>> >>>>>> Got a link to that section of the standard, or better yet, to a >>>>>> 'interpreted' version of the standard? :-) >>>>>> >>>>>> Stephan Szabo wrote: >>>>>> >>>>>> >>>>>>> On Wed, 13 Aug 2003, Dennis Gearon wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Dennis Bj?rklund wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> In the future we need indexes that depend on the locale (and a >>>>>>>>> lot of other changes). >>>>>>>>> >>>>>>>> >>>>>>>> I agree. I've been looking at the web on this subject a lot >>>>>>>> lately. I >>>>>>>> am **NOT** a microslop fan, but SQL-SERVER even lets a user >>>>>>>> define a >>>>>>>> language(maybe encoding) down to the column level! >>>>>>>> >>>>>>>> I've been reading on GNU-C and on languages, encoding, and >>>>>>>> localization. >>>>>>>> >>>>>>>> http://pauillac.inria.fr/~lang/hotlist/free/licence/fsf96/drepper/paper-1.html >>>>>>>> >>>>>>>> http://h21007.www2.hp.com/dspp/tech/tech_TechSingleTipDetailPage_IDX/1,2366,1222,00.html >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> There are three basic approaches to doing different langauges in >>>>>>>> computerized text: >>>>>>>> >>>>>>>> A/ various adaptations of the 8 bit character set, I.E. the >>>>>>>> ISO-8859-x series. >>>>>>>> B/ wide characters >>>>>>>> ********This should be how Postgress stores data >>>>>>>> internally.******** >>>>>>>> C/ Multibyte characters >>>>>>>> ********This is how Postgress should default to sending data >>>>>>>> OUT of the application, >>>>>>>> i.e. to the display or the web, or other system >>>>>>>> applications******** >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> SQL has a system for defining character set specifications, >>>>>>> collations and >>>>>>> such (per column/literal in some cases). We should probably look >>>>>>> at it >>>>>>> before making decisions on how to do things. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> I thought UNIX (SCOTM) systems also had a way of being able to >>>>> define collation order. >>>>> >>>>> see: >>>>> ftp://dkuug.dk/i18n/WG15-collection/locales >>>>> >>>>> for a collection of all ISO standardized locales (the WG15 ISO work >>>>> group's stuff). >>>>> >>>>> Do a "man localedef" on most Linuxen or UNIXen. >>>>> >>>>> As for wide characters vs multibyte, there is no clear winner. The >>>>> right answer DEPENDS on the situation. >>>>> >>>>> Wide characters on some platforms are 16 bit which means that when >>>>> you do Unicode you'll still have problems with surrogate pairs >>>>> (meaning that it's still multi (wide) char) so you still have all >>>>> the problems of multi-byte encodings. >>>>> >>>>> You could decide to process everything in a PG specific 4 byte wide >>>>> char and do all text in Unicode but the overhead in processing 4 >>>>> times the data is quite significant. The other option is to store >>>>> all data in utf-8 and have all text code become utf-8 aware. >>>>> >>>>> I have found in practice that the utf-8 option is significantly >>>>> easier to implement, 100% Unicode compliant and the best performer >>>>> (because of reduced memory requirements). >>>>> The Posix API's for locales are not very good for modern day >>>>> programs, I'm not sure where the "mbr*" and the "wcr*" apis are in >>>>> the standardization process but if these are not well supported, >>>>> you're on your own and will need to implement similar functionality >>>>> from scratch and for that matter, the collation functions all >>>>> operate on a "current" locate which is really difficult to work >>>>> with on multi-locale applications. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ---------------------------(end of >>>>> broadcast)--------------------------- >>>>> TIP 8: explain analyze is your friend >>>>> >>>> >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 7: don't forget to increase your free space map settings >>>> >>> >>> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: the planner will ignore your desire to choose an index scan if >> your >> joining column's datatypes do not match > > >
pgsql-general by date: