Re: Case Conversion Fix for MB Chars - Mailing list pgsql-patches

From Volkan YAZICI
Subject Re: Case Conversion Fix for MB Chars
Date
Msg-id 7104a7370512021207w4d3568b2i37e156d9cb03daef@mail.gmail.com
Whole thread Raw
In response to Re: Case Conversion Fix for MB Chars  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-patches
Last minute edit:
src/test/mb seems a little bit old. I've tested SQL files in
src/test/mb/sql with the expected results in src/test/mb/expected
manually and it worked. (Output files need a little bit editing, like
removing lines similar to "CREATE TABLE".) But it'll be better if any
EUC users will try 'em manually too.

On 12/2/05, Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> Volkan YAZICI wrote:
> > After Tom's advice (he was doubtful about the patch), while I was
> > thinking about how to improve the spectrum of tests, decided to use
> > src/test/mb. In the tests, patch just succeded for unicode and failed
> > on big5, euc_cn, euc_jp, euc_kr, euc_tw, mule_internal, sjis
> > encodings. Fails' reason can be my wrong configuration too. (I've made
> > tests on a both unicode and latin-5 encoded databases.)
>
> Do those encodings even have uppercase letters?

According to what IRC folks, yes.

> People have talked about ICU but I don't know if anyone is working on it
> now.

Furthermore, there're some unofficial ICU patches for PostgreSQL
around. Like the one @
http://people.freebsd.org/~girgen/postgresql-icu/README.html

> I think the big problem is that while your patch works for some cases,
> it fails for others

As I mentioned in the above, it seems like it's working for other ones too.

> and there is no good way to know/test which will
> work and which will not. Is that accurate?

You don't want to commit this patch because it breaks[*] EUC like
encodings. But OTOH, it fixes LatinN and UNICODE encodings. I'm really
wondering, while we're trying to protect the EUC encodings still
working, why there's not any EUC users around to take care of EUC
tests? Doesn't EUC have any problems? Do ILIKE, upper/lower work for
them properly?

[*] If I didn't make a mistake, manual tests succeded for EUC like
encodings too.

You can think the reverse of the subject too. Think LatinN and UNICODE
as working and somebody submitted a patch which fixes EUC encodings by
breaking the previous ones. What will be the reaction of PostgreSQL
team in this situation?


Regards.

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Numeric 508 datatype
Next
From: Alvaro Herrera
Date:
Subject: Re: Numeric 508 datatype