Re: Patch for collation using ICU - Mailing list pgsql-hackers

From John Hansen
Subject Re: Patch for collation using ICU
Date
Msg-id 5066E5A966339E42AA04BA10BA706AE50A92FF@rodrick.geeknet.com.au
Whole thread Raw
In response to Patch for collation using ICU  (Palle Girgensohn <girgen@pingpong.net>)
Responses Re: Patch for collation using ICU
Re: Patch for collation using ICU
List pgsql-hackers
Btw, I had been planning to propose replacing every single one of the built in charset conversion functions with calls
toICU (thus making pg _depend_ on ICU), as this would seem like a cleaner solution than for us to maintain our own
conversiontables. 

ICU also has a fair few conversions that we do not have at present.

Any thoughts?

... John

> -----Original Message-----
> From: John Hansen
> Sent: Saturday, May 07, 2005 11:09 PM
> To: 'Palle Girgensohn'; 'Bruce Momjian'
> Cc: 'pgsql-hackers@postgresql.org'
> Subject: RE: [HACKERS] Patch for collation using ICU
>
> > --On lördag, maj 07, 2005 22.53.46 +1000 John Hansen
> > <john@geeknet.com.au>
> > wrote:
> >
> > > Errm,... initdb --encoding UNICODE --locale C
> >
> > You mean that ICU *shall* be used even for the C locale, and not as
> > Bruce suggested here:
>
> Yes, that's exactly what I mean.
>
> >
> > >> I do have a few questions:
> > >>
> > >> Why don't you use the lc_ctype_is_c() part of this test?
> > >>
> > >>      if (pg_database_encoding_max_length() > 1 &&
> !lc_ctype_is_c())
> > >
> > > Um, well, I didn't think about that. :)  What would be the
> > locale in
> > > this case? c_C.UTF-8? ;)  Hmm, it is possible to have
> > CTYPE=C and use
> > > a wide encoding, indeed. Then the strings will be handled
> > like byte-wide chars.
> > > Yeah, it's a bug. I'll fix it! Thanks.
> >
> > John disagrees here, and I'm obliged to agree. Using the C
> locale, one
> > will expect C collation, but upper/lower is better off still using
> > ICU. Hence, the above stuff is *not* a bug. Do we agree?
> >
> > /Palle
> >
> >
> > >
> > >> -----Original Message-----
> > >> From: pgsql-hackers-owner@postgresql.org
> > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
> > John Hansen
> > >> Sent: Saturday, May 07, 2005 10:23 PM
> > >> To: Palle Girgensohn; Bruce Momjian
> > >> Cc: pgsql-hackers@postgresql.org
> > >> Subject: Re: [HACKERS] Patch for collation using ICU
> > >>
> > >> >
> > >> > I use this patch in production on one FreeBSD 4.10
> server at the
> > >> > moment.
> > >> > With the latest version, I've had no problems. Logging is
> > >> swithed on
> > >> > for now, and it shows no signs of ICU complaining. I'd
> like more
> > >> > reports on Linux, though.
> > >>
> > >> I currently use this on gentoo with ICU3.2 unmasked.
> > >>
> > >> Works a dream, even with locale C and UNICODE database.
> > >>
> > >> Small test:
> > >>
> > >> createdb --encoding UNICODE --locale C test psql test set
> > >> client_encoding=iso88591; CREATE TABLE test (t text);
> INSERT INTO
> > >> test (t) VALUES ('æøå'); set client_encoding=unicode;
> INSERT INTO
> > >> test (t) SELECT upper(t) FROM test; set
> client_encoding=iso88591;
> > >> SELECT * FROM test;
> > >>   t
> > >> -----
> > >>  æøå
> > >>  ÆØÅ
> > >> (2 rows)
> > >>
> > >> Just as I'd expect, as upper/lower/initcap are locale
> > independent for
> > >> these characters.
> > >>
> > >>
> > >> ---------------------------(end of
> > >> broadcast)---------------------------
> > >> TIP 5: Have you checked our extensive FAQ?
> > >>
> > >>                http://www.postgresql.org/docs/faq
> > >>
> > >>
> >
> >
> >
> >
> >
> >


pgsql-hackers by date:

Previous
From: "John Hansen"
Date:
Subject: Re: Patch for collation using ICU
Next
From: Palle Girgensohn
Date:
Subject: Re: Patch for collation using ICU