Re: Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport |
Date | |
Msg-id | 200011160601.BAA02070@candle.pha.pa.us Whole thread Raw |
In response to | Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Responses |
Re: Re: [PATCHES] A Patch for MIC to EUC_TW code
converting in mbsupport
|
List | pgsql-hackers |
Can someone tell me where we are on this? Tatsuo, I think you said you wanted to apply this fix. > [Cced to hackers list] > > > > BTW I have found another bug with EUC_TW support. line 917 in conv.c: > > > > > > *p++ = c1 - LC_CNS11643_3 + 0xa3; > > > > > > this should be: > > > > > > *p++ = *mic++ - LC_CNS11643_3 + 0xa3; > > > > > > Otherwise, CNS 11643-1992 Plane 3 or more won't work. Could you test > > > it out with CNS 11643-1992 Plane 3 or more? > > > > Thanks for your very quickly reply!! > > You are welcome. > > > I think you are right, but I have not test it. > > Because original Big5 encoding does not contain characters in CNS 11643-1992 > > Plane 3. > > But I will have a chance to test it, we here are seeking the support for Big5E > > (an extendied Big5 > > encoding) in PostgreSQL. Though most people who use PostgresSQL in Taiwan only > > cares about > > Big5 encoding . > > > > Would you like to answer some mb related questions for me? I am a newbie :P > > > > 1.) Because the 2nd byte of Big5 encoding overlaps with ASCII, > > such as '\' (this is very bad for many programs to work with Big5). > > As long as frontend side knows the current client side encoding is > Big5, this should be no problem. At least for libpq. It examins the > first byte of Big5. If it is greater than 0x7f, then it must be a > double byte Hanji. So libpq reads 2 bytes in this case, not matter the > second byte is '\'. > > > For example: If we initdb -E MULE_INTERNAL first, > > SET CLIENT_ENCODING TO 'BIG5', and > > INSERT INTO some_table VALUES (..., 'the last byte of some Big5 char is > > backslash\',...), > > then we can not successfully complete this SQL INSERT -- the prompt of psql > > changes > > but psql does not execute it. If we initdb -E with EUC_TW, it's OK. > > Is this is a parsing problem? What's your suggestion for the solution? > > Hum. initdb -E MULE_INTERNAL should work as well. Let me dig into the > problem. It would be nice if you could send me the Big5 data for > testing by a private mail. > > BTW I would not recommend "SET CLIENT_ENCODING TO 'BIG5'" to do an > on-the-fly encoding changes. Since in this way, frontend side has no > idea what the client encoding is. 7.0.x overcome this problem by > introducing new \encoding command. For 6.5 or before I would recommend > to use PGCLIENTENCODING environment variable. > > > 2.) Is using MULE_INTERNAL faster than EUC_TW as backend encoding when > > PostgreSQL processing Big5 data? (It seems > > BIG5->big52mic()->mic2euc_tw()->EUC_TW > > needs 2 code converting procedures, but BIG5->big52mic()->EUC_TW only needs > > one from > > the mb sources) > > Yes. But the difference would be very small. The expensive part is a > table look-up in big52mic. > > BTW 7.1 will support automatic encoding conversion between Unicode > (UTF-8) and Big5 (or EUC_TW). Try the snapshot if you like. > > > 3.) Dose PostgreSQL's ODBC driver support mb? > > I don't think so. For Japanese (EUC_JP/SJIS) Kataoka has made patches > to enable MB support in ODBC. It should not be very difficult to > support EUC_TW/Big5, I don't know. > -- > Tatsuo Ishii > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: