> While people are working on that, they might want to add some sanity checking
> to the multibyte character decoders. Currently they fail to check for
> "illegal" character sequences (i.e. sequences with no valid multibyte mapping),
> and fail to do something reasonable (like return an error, silently drop the
> offending characters, or anything else besides just returning random garbage
> and crashing the backend).
Hum.. I thought Michael Robinson is the one who is against the idea of
rejecting "illegal" character sequences before they are put in the DB.
I like the idea but I haven't time to do that (However I'm not sure I
would like to do it for EUC-CN, since he dislikes the codes I write).
Bruce, I would like to see followings in the TODO. I also would like
to hear from Thomas and Peter or whoever being interested in
implementing NATIONAL CHARACTER stuffs if they are reasonable.
o Don't accept character sequences those are not valid as their charset (signaling ERROR seems appropriate IMHO)
o Make PostgreSQL more multibyte aware (for example, TRIM function and NAME data type)
o Regard n of CHAR(n)/VARCHAR(n) as the number of letters, rather than the number of bytes
--
Tatsuo Ishii