Re: Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport
Date
Msg-id 20001117112902C.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: Re: [PATCHES] A Patch for MIC to EUC_TW code converting in mbsupport  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
> Can someone tell me where we are on this?  Tatsuo, I think you said you
> wanted to apply this fix.

I wanted to apply the fix after Chih-Chang Hsieh tested it out. But he
said he couldn't becuase no test data was available for it. However I
and he now are in the same opinion that the fix seems correct, and I
am going to apply the fix, probably by tomorrow.

> > [Cced to hackers list]
> > 
> > > > BTW I have found another bug with EUC_TW support. line 917 in conv.c:
> > > >
> > > >                         *p++ = c1 - LC_CNS11643_3 + 0xa3;
> > > >
> > > > this should be:
> > > >
> > > >                         *p++ = *mic++ - LC_CNS11643_3 + 0xa3;
> > > >
> > > > Otherwise, CNS 11643-1992 Plane 3 or more won't work. Could you test
> > > > it out with CNS 11643-1992 Plane 3 or more?
> > > 
> > > Thanks for your very quickly reply!!
> > 
> > You are welcome.
> > 
> > > I think you are right, but I have not test it.
> > > Because original Big5 encoding does not contain characters in CNS 11643-1992
> > > Plane 3.
> > > But I will have a chance to test it, we here are seeking the support for Big5E
> > > (an extendied Big5
> > > encoding) in PostgreSQL. Though most people who use PostgresSQL in Taiwan only
> > > cares about
> > > Big5 encoding .
> > > 
> > > Would you like to answer some mb related questions for me? I am a newbie :P
> > > 
> > > 1.) Because the 2nd byte of Big5 encoding overlaps with ASCII,
> > >     such as '\' (this is very bad for many programs to work with Big5).
> > 
> > As long as frontend side knows the current client side encoding is
> > Big5, this should be no problem. At least for libpq. It examins the
> > first byte of Big5. If it is greater than 0x7f, then it must be a
> > double byte Hanji. So libpq reads 2 bytes in this case, not matter the
> > second byte is '\'.
> > 
> > >     For example: If we initdb -E MULE_INTERNAL first,
> > >     SET CLIENT_ENCODING TO 'BIG5', and
> > >     INSERT INTO some_table VALUES (..., 'the last byte of  some Big5 char is
> > > backslash\',...),
> > >     then we can not successfully complete this SQL INSERT -- the prompt of psql
> > > changes
> > >     but psql does not execute it. If we initdb -E with EUC_TW, it's OK.
> > >     Is this is a parsing problem? What's your suggestion for the solution?
> > 
> > Hum. initdb -E MULE_INTERNAL should work as well. Let me dig into the
> > problem. It would be nice if you could send me the Big5 data for
> > testing by a private mail.
> > 
> > BTW I would not recommend "SET CLIENT_ENCODING TO 'BIG5'" to do an
> > on-the-fly encoding changes. Since in this way, frontend side has no
> > idea what the client encoding is. 7.0.x overcome this problem by
> > introducing new \encoding command. For 6.5 or before I would recommend
> > to use PGCLIENTENCODING environment variable.
> > 
> > > 2.) Is using MULE_INTERNAL faster than EUC_TW as backend encoding when
> > >      PostgreSQL processing Big5 data?  (It seems
> > > BIG5->big52mic()->mic2euc_tw()->EUC_TW
> > >      needs 2 code converting procedures, but BIG5->big52mic()->EUC_TW only needs
> > > one from
> > >      the mb sources)
> > 
> > Yes. But the difference would be very small. The expensive part is a
> > table look-up in big52mic.
> > 
> > BTW 7.1 will support automatic encoding conversion between Unicode
> > (UTF-8) and Big5 (or EUC_TW). Try the snapshot if you like.
> > 
> > > 3.) Dose PostgreSQL's ODBC driver support mb?
> > 
> > I don't think so. For Japanese (EUC_JP/SJIS) Kataoka has made patches
> > to enable MB support in ODBC. It should not be very difficult to
> > support EUC_TW/Big5, I don't know.
> > --
> > Tatsuo Ishii
> > 
> 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [PATCHES] A Patch for MIC to EUC_TW code converting inmbsupport
Next
From: Philip Warner
Date:
Subject: Re: [rfc] new CREATE FUNCTION (and more)