Home > mailing lists

Re: Latest on CITEXT 2.0 - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Latest on CITEXT 2.0
Date	July 1, 2008 12:25:42
Msg-id	200807011525.m61FP7221773@momjian.us Whole thread
In response to	Re: Latest on CITEXT 2.0 ("Marko Kreen" <markokr@gmail.com>)
Responses	Re: Latest on CITEXT 2.0
List	pgsql-hackers

Tree view

Marko Kreen wrote:
> On 7/1/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "Marko Kreen" <markokr@gmail.com> writes:
> >  > On 6/26/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > >> BTW, I don't think you can use that same-length optimization for
> >  >> citext.  There's no reason to think that upper/lowercase pairs will
> >  >> have the same length all the time in multibyte encodings.
> >
> >  > What about this code in current str_tolower():
> >
> >  >         /* Output workspace cannot have more codes than input bytes */
> >  >         workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
> >
> >
> > That's working with wchars, not bytes.
> 
> Ah, I missed the point of char2wchar() line.
> 
> I'm rather unfamiliar with various MB API-s, sorry.
> 
> There's another thing I'm probably missing: does current code handle
> multi-wchar codepoints?  Or is it guaranteed they don't happen?
> (Wasn't wchar_t usually 16bit value?)

If you want a simple example of wide character use look at
oracle_compat.c::upper() which calls str_toupper() in CVS HEAD.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

pgsql-hackers by date:

From: Richard Huxton
Date: 01 July 2008, 12:22:29
Subject: Re: Does anything dump per-database config settings? (was Re: ALTER DATABASE vs pg_dump)

From: "Marko Kreen"
Date: 01 July 2008, 12:33:09
Subject: Re: Latest on CITEXT 2.0

Re: Latest on CITEXT 2.0 - Mailing list pgsql-hackers

Previous

Next