Re: Patch for collation using ICU - Mailing list pgsql-hackers
From | John Hansen |
---|---|
Subject | Re: Patch for collation using ICU |
Date | |
Msg-id | 5066E5A966339E42AA04BA10BA706AE50A92FF@rodrick.geeknet.com.au Whole thread Raw |
In response to | Patch for collation using ICU (Palle Girgensohn <girgen@pingpong.net>) |
Responses |
Re: Patch for collation using ICU
Re: Patch for collation using ICU |
List | pgsql-hackers |
Btw, I had been planning to propose replacing every single one of the built in charset conversion functions with calls toICU (thus making pg _depend_ on ICU), as this would seem like a cleaner solution than for us to maintain our own conversiontables. ICU also has a fair few conversions that we do not have at present. Any thoughts? ... John > -----Original Message----- > From: John Hansen > Sent: Saturday, May 07, 2005 11:09 PM > To: 'Palle Girgensohn'; 'Bruce Momjian' > Cc: 'pgsql-hackers@postgresql.org' > Subject: RE: [HACKERS] Patch for collation using ICU > > > --On lördag, maj 07, 2005 22.53.46 +1000 John Hansen > > <john@geeknet.com.au> > > wrote: > > > > > Errm,... initdb --encoding UNICODE --locale C > > > > You mean that ICU *shall* be used even for the C locale, and not as > > Bruce suggested here: > > Yes, that's exactly what I mean. > > > > > >> I do have a few questions: > > >> > > >> Why don't you use the lc_ctype_is_c() part of this test? > > >> > > >> if (pg_database_encoding_max_length() > 1 && > !lc_ctype_is_c()) > > > > > > Um, well, I didn't think about that. :) What would be the > > locale in > > > this case? c_C.UTF-8? ;) Hmm, it is possible to have > > CTYPE=C and use > > > a wide encoding, indeed. Then the strings will be handled > > like byte-wide chars. > > > Yeah, it's a bug. I'll fix it! Thanks. > > > > John disagrees here, and I'm obliged to agree. Using the C > locale, one > > will expect C collation, but upper/lower is better off still using > > ICU. Hence, the above stuff is *not* a bug. Do we agree? > > > > /Palle > > > > > > > > > >> -----Original Message----- > > >> From: pgsql-hackers-owner@postgresql.org > > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of > > John Hansen > > >> Sent: Saturday, May 07, 2005 10:23 PM > > >> To: Palle Girgensohn; Bruce Momjian > > >> Cc: pgsql-hackers@postgresql.org > > >> Subject: Re: [HACKERS] Patch for collation using ICU > > >> > > >> > > > >> > I use this patch in production on one FreeBSD 4.10 > server at the > > >> > moment. > > >> > With the latest version, I've had no problems. Logging is > > >> swithed on > > >> > for now, and it shows no signs of ICU complaining. I'd > like more > > >> > reports on Linux, though. > > >> > > >> I currently use this on gentoo with ICU3.2 unmasked. > > >> > > >> Works a dream, even with locale C and UNICODE database. > > >> > > >> Small test: > > >> > > >> createdb --encoding UNICODE --locale C test psql test set > > >> client_encoding=iso88591; CREATE TABLE test (t text); > INSERT INTO > > >> test (t) VALUES ('æøå'); set client_encoding=unicode; > INSERT INTO > > >> test (t) SELECT upper(t) FROM test; set > client_encoding=iso88591; > > >> SELECT * FROM test; > > >> t > > >> ----- > > >> æøå > > >> ÆØÅ > > >> (2 rows) > > >> > > >> Just as I'd expect, as upper/lower/initcap are locale > > independent for > > >> these characters. > > >> > > >> > > >> ---------------------------(end of > > >> broadcast)--------------------------- > > >> TIP 5: Have you checked our extensive FAQ? > > >> > > >> http://www.postgresql.org/docs/faq > > >> > > >> > > > > > > > > > > > >
pgsql-hackers by date: