Re: Patch for collation using ICU - Mailing list pgsql-hackers
From | Palle Girgensohn |
---|---|
Subject | Re: Patch for collation using ICU |
Date | |
Msg-id | 9660F286965D59F2BEE49288@palle.girgensohn.se Whole thread Raw |
In response to | Re: Patch for collation using ICU ("John Hansen" <john@geeknet.com.au>) |
Responses |
Re: Patch for collation using ICU
|
List | pgsql-hackers |
--On fredag, mars 25, 2005 23.39.33 +1100 John Hansen <john@geeknet.com.au> wrote: > Ok,.. tested on debian sarge with ICU 3.2 > UNICODE Database, C locale. > > upper() and lower() returns an empty string for any input, including > 7bit ascii, regardless of client_encoding, so something is obviously > broken. > > Have you tested this patch on a UNICODE DB with locale C/POSIX ? No, honestly not. Mostly tested it with my needs, sv_SE.UTF-8 and UNICODE, and also de_DE.UTF-8. How will PostgreSQL react to this combo? A database cluster initdb:ed with locale=C/POSIX, and then a database in UNICODE (really utf-8) representation... hmm... I think I might have made a false assumption that the locale string would contain the character encoding. I do something like encoding = strchr(locale, '.') + 1... That code will be confused by a 'C' locale, indeed. I'll check it out! /Palle > > ... John > >> -----Original Message----- >> From: John Hansen >> Sent: Friday, March 25, 2005 10:27 PM >> To: 'Palle Girgensohn'; 'pgsql-hackers@postgresql.org' >> Subject: RE: [HACKERS] Patch for collation using ICU >> >> > --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen >> > <john@geeknet.com.au> >> > wrote: >> > >> > > Useful if it's going to support earlier releases of ICU.... >> > > >> > > Not all os's come with ICU3.2, debian for example, >> > currently has 2.1 >> > > in testing, and 2.6 in unstable. >> > >> > Oh, OK. FreeBSD has only the 3.2 as port. I can check the older >> > version, I doubt it would too much difference. Some >> autoconf sorcery >> > needed, perhaps. >> >> Naww, it's no biggie, we'll just need to include ICU with pg I think. >> I tried that, there are several functions from ICU that you >> use, that are not in ICU2.1 >> >> Dono about 2.6. >> >> However, ICU3.2 compiles on debian with a small change to the >> debian/rules file. >> debian/tmp/etc is missing, so add mkdir debian/tmp/etc >> >> ... John >> >> > >> > /Palle >> > >> > > >> > > ... John >> > > >> > >> -----Original Message----- >> > >> From: pgsql-hackers-owner@postgresql.org >> > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Palle >> > >> Girgensohn >> > >> Sent: Friday, March 25, 2005 10:40 AM >> > >> To: pgsql-hackers@postgresql.org >> > >> Subject: [HACKERS] Patch for collation using ICU >> > >> >> > >> Hi! >> > >> >> > >> I've put together a patch for using IBM's ICU package for >> > collation. >> > >> >> > >> If your OS does not have full support for collation ur >> > >> uppercase/lowercase in multibyte locales, this might be >> useful. If >> > >> you are using a multibyte character encoding in your >> database and >> > >> want collation, i.e. order by, and also lower(), upper() and >> > >> initcap() to work properly, this patch will do just that. >> > >> >> > >> This patch is needed for FreeBSD, since this OS has no >> support for >> > >> collation of for example unicode locales (that is, >> wcscoll(3) does >> > >> not do what you expect if you set LC_ALL=sv_SE.UTF-8, for >> > example). >> > >> AFAIK the patch is *not* necessary for Linux, although IBM >> > claims ICU >> > >> collation to be about twice as fast as glibc for simple western >> > >> locales. >> > >> >> > >> It adds a configure switch, `--with-icu', which will set >> > up the code >> > >> to use ICU instead of wchar_t and wcscoll. >> > >> >> > >> This has been tested only on FreeBSD-4.11 & >> > FreeBSD-5-stable, where >> > >> it seems to run well. I've not had the time to do any >> comparative >> > >> performance tests yet, but it seems it is at least not >> slower than >> > >> using LATIN1 with >> > >> sv_SE.ISO8859-1 locale, perhaps even faster. >> > >> >> > >> I'd be delighted if some more experienced postgresql >> hackers would >> > >> review this stuff. The patch is pretty compact, so it's >> > fast reading >> > >> :) I'm planning to add this patch as an option (tagged >> > >> "experimental") to FreeBSD's postgresql port. Any ideas >> > about whether >> > >> this is a good idea or not? >> > >> >> > >> Any thoughts or ideas are welcome! >> > >> >> > >> Cheers, >> > >> Palle >> > >> >> > >> Patch at: >> > >> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2 >> > > 005-03-14.diff> >> > >> >> > >> ICU at sourceforge: <http://icu.sf.net/> >> > >> >> > >> >> > >> ---------------------------(end of >> > >> broadcast)--------------------------- >> > >> TIP 7: don't forget to increase your free space map settings >> > >> >> > >> >> > >> > >> > >> > >> > >> >
pgsql-hackers by date: