Re: BUG #3638: UTF8 Character encoding does NOT work - Mailing list pgsql-bugs

From Tatsuo Ishii
Subject Re: BUG #3638: UTF8 Character encoding does NOT work
Date
Msg-id 20070928.102329.76083990.t-ishii@sraoss.co.jp
Whole thread Raw
In response to BUG #3638: UTF8 Character encoding does NOT work  ("Fil Matthews" <fil@internetmediapro.com>)
List pgsql-bugs
> Tatsuo Ishii wrote:
> > Why do you think that an UTF-8 encoded string starting with 0x92 is
> > valid?
> >
> > 0x92 can appear in the second, third or fourth octet, but should never
> > appear in the first octet.
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> >
> >> The following bug has been logged online:
> >>
> >> Bug reference:      3638
> >> Logged by:          Fil Matthews
> >> Email address:      fil@internetmediapro.com
> >> PostgreSQL version: 8-1    , 8-2
> >> Operating system:   Linux  Debian  - Windows XP
> >> Description:        UTF8 Character encoding does NOT work
> >> Details:
> >>
> >> Judging from the amount of Google page hits with the exact same problem I am
> >> surprised and mystified by this obvious flaw in Postgres Technology..
> >>
> >> Just how is one expected to work with  UTF8 character sets when all and
> >> every attempt at using even Postgres clients produces the SAME problem
> >> every time ???
> >>
> >>  "invalid byte sequence for encoding "UTF8": 0x92"
> >>
> >> In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8
> >>
> >> Tables test.text ->   (Chararcter varying 10)
> >>
> >> In any  Postgres Client ie  psql , dbadmin III
> >>
> >> Insert into test values (  chr(146));;
> >>
> >>
> >> Query returned successfully: 1 rows affected, 32 ms execution time.
> >>
> >> copy test to '/tmp/testfile.txt';
> >>
> >>
> >> Query returned successfully: 1 rows affected, 15 ms execution time.
> >>
> >> copy test from '/tmp/testfile.txt';
> >>
> >>
> >> Come on are you serious?? ..  Just how does one work with completly valid
> >> data that has an ascii 128 +  value ??
> >>
> >> Currently this flaw make Postgres an un-useable database technology ..  Or
> >> can some-one please explain this and a possible work around ..  ??
> >>
> >> Thank You
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 1: if posting/reading through Usenet, please send an appropriate
> >>        subscribe-nomail command to majordomo@postgresql.org so that your
> >>        message can get through to the mailing list cleanly
> >>
> >
> >
>
> Sorry But I don't agree.. Why can't Postgres store a legitimate 8 bit
> byte value that is below 255??  and treat it as text ..
> Not being able to do this this makes Postgres unusable.. for storing
> TEXT values..
>
> I do not know ANY other database technology that doesn't allow some form
> of storing a legitimate 8 bit byte ...
>
> Even the most simplest  open -source database in the world  (and most
> popular)  can do this..
>
> The biggest and best  (Thank you Larry) can do this ...
>
> Postgres can't.
>
> In other words  You are claiming that UTF8  is  actually UTF7 ....

No.

> There are 8 bits in a byte.. not 7 ..  If UTF8  can't by definition
> store 8 bits  then what standard can??

UTF-8 does not accept arbitary 8 bit characters. The byte ranges UTF-8
accepts are precisely defined in the standard. If our implementation
is different from it, please let us know.

> The technology is wrong  and it is incorrect...  If one looks at the
> output  of the copy file
> od -c       then QUITE correctly the 8 bit value  is stored as the value
> given..
>
> What then is the problem in putting this value back in the text field it
> came from ??

PostgreSQL needs to follow the standard. That's it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

pgsql-bugs by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: BUG #3638: UTF8 Character encoding does NOT work
Next
From: "Antonio Mari"
Date:
Subject: BUG #3640: PANIC: ERRORDATA_STACK_SIZE exceeded