Re: Invalid EUC_JP char seq bug? - Mailing list pgsql-bugs

From Tatsuo Ishii
Subject Re: Invalid EUC_JP char seq bug?
Date
Msg-id 20030702.190040.74753986.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: Invalid EUC_JP char seq bug?  (Jean-Christian Imbeault <jc@mega-bucks.co.jp>)
List pgsql-bugs
> >>search_words=%B7%F6%BA%7E
> >>select id from products where name like '??~'
> >>Query failed: ERROR:  Invalid EUC_JP character sequence found (0xba7e)
> >
> >
> > This is definitly a bad EUC_JP.
>
> According to a PHP developer in my bug report
> (http://bugs.php.net/bug.php?id=24309&edit=2):
>
> "URL decoded byte sequance of 'search_words=%B7%F6%BA%7E' is
> B7E6+BA7E, which is correct EUC-JP character sequence. [snip] But, I
> believe encoding detection of mbstring works fine in this case.
> B7E6+BA7E is not correct byte sequence of SJIS, UTF-8, ISO2022-JP. It is
> correct EUC-JP byte sequence."
>
> I see that he wrote B7E6 instead of the correct B7F6. I resubmitted my
> bug report to PHP and pointed this out. Hopefully the developer will see
> that this sequence is incorrect EUC-JP and that PHP failed to detect this :)

In the EUC_JP encoding there are some rules:

1) if the first byte is 0x8e then second byte is a JIS 0201 character
   and should be greater than 0x7f

2) else if the first byte is 0x8f then second and third byte is a JIS
   0212 character and they should be greater than 0x7f

3) else if the first byte is greater than 0x7f then second and third
   byte is a JIS 0208 character and they should be greater than 0x7f

4) else the byte is ASII and should be eqaul to or less than 0x7f

Apparently:

B7F6: this is ok. we can apply rule #3
BA7E: this is not good, since it satisfies non of rule #1 to #4

> Thanks!
>
> Jean-Christian Imbeault
>
> PS I posted to HACKERS a few weeks ago about another bug (a real one :)
> in the EUC-JP translation having to do with the WAVE DASH. I'll repost
> here on the BUGS list, could you let me know the status of that BUG? Thanks!

Sorry for the delay. In EUC-JP <--> Unicode translation, WAVE DASH is
always a problem since there are several different mappings among
different vendors/standards. I think I need more time to solve this.
--
Tatsuo Ishii

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump -t option doesn't take schema-qualified table names
Next
From: Ian Grant
Date:
Subject: 7.3.3 configure should check for curses before readline