Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>No. some EUC's (EUC_TW and EUC_JP) has three-byte or even four-byte
>codes. But you said your database has been configured as EUC_CN. As
>far as I know, it only uses 1 or 2 byte-code. Another thing I am
>confused is that ' \217\210' is not a valid EUC_CN data at all. \217
>(0x8f) specifies code set 3 which does not exist in EUC_CN. In this
>case, it is assumed that the multi-byte word to be consisted of 3-byte
>code in the current implementation of PostgreSQL.
It could be that one of our users had their input method set to produce
EUC_TW or Big5.
>In short, the problem you have is caused by:
>1) wrong data submitted into the table
Kind of hard to control that when data is submitted by random users on
the Internet.
>I would recommend you delete the data since it's not correct anyway.
>In the mean time I'm going to fix 2) so that it assumes data be
>consisted of 2 bytes even if wrong data sequence is submitted
>(needless to say, except ascii).
>Do you want the backpatch for 6.5.3?
Very much so. Thank you.
-Michael