Multibyte still broken - Mailing list pgsql-hackers

From Michael Robinson
Subject Multibyte still broken
Date
Msg-id 200005101408.WAA07324@netrinsics.com
Whole thread Raw
Responses Re: Multibyte still broken  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Multibyte still broken  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-hackers
These are exerpts from a message from Tatsuo Ishii dated January 26, on
the subject of fragile code in the multibyte routines:

---- begin ----
Defensive programming saves the system but does not user. Once
corrupted data is stored in the system, it's totally useless for the
user anyway.  What about validating data *before* inserting it into a
table?
---- end ----

---- begin ----
> >Here it is. With this patch, copy out should be happy even with the
> >wrong data. I'm not sure if it could be displayed correctly, though.
> 
> Thank you very much.  However, I think even this is too optimistic:
> 
> >!     if (*s & 0x80)
> 
> Shouldn't it be something like:
> 
>     if ((*s & 0x80) && (*(s+1) & 0x80))
> 
> Even though "\242\242\242\0" is an invalid EUC sequence, it still shouldn't be
> allowed to break the software.

Thanks for the suggestion. More robust code is always good.
---- end ----

More robust code may always be good, but "good" apparently doesn't always go
into the tree.  Imagine my surprise, while upgrading a production server
from 6.5.3 to 7.0, when the data dumped from the old database failed to load
into the new database (well, crashed the backend, to be specific).

Apparently the "validate your own damn data" sentiment of the first excerpt
above has prevailed, because, on inspection, the MB code is just as fragile
as it was five months ago.

I was forced to perform emergency repairs to my database dump file to fool a 
non-multibyte 7.0 into accepting it.  Since EUC_CN is compatible with 
Latin-1, and since the benefits of multibyte are small compared to the 
risks, I intend to stick with unibyte Postgres henceforth.

I would, though, recommend a warning in the "INSTALL" file along the lines of:
 "WARNING: Use of improperly-encoded text with multi-byte support enabled  WILL lead to data corruption and/or loss.
Donot enable multi-byte support  unless you intend to fully validate your own damn data."
 
-Michael Robinson



pgsql-hackers by date:

Previous
From: Thomas Lockhart
Date:
Subject: FTP site
Next
From: "Ross J. Reedstrom"
Date:
Subject: Re: pgsql/php3/apache authentication