> More robust code may always be good, but "good" apparently doesn't always go
> into the tree. Imagine my surprise, while upgrading a production server
> from 6.5.3 to 7.0, when the data dumped from the old database failed to load
> into the new database (well, crashed the backend, to be specific).
>
> Apparently the "validate your own damn data" sentiment of the first excerpt
> above has prevailed, because, on inspection, the MB code is just as fragile
> as it was five months ago.
>
> I was forced to perform emergency repairs to my database dump file to fool a
> non-multibyte 7.0 into accepting it. Since EUC_CN is compatible with
> Latin-1, and since the benefits of multibyte are small compared to the
> risks, I intend to stick with unibyte Postgres henceforth.
>
> I would, though, recommend a warning in the "INSTALL" file along the lines of:
>
> "WARNING: Use of improperly-encoded text with multi-byte support enabled
> WILL lead to data corruption and/or loss. Do not enable multi-byte support
> unless you intend to fully validate your own damn data."
Sorry for the problem. I forgot about issue:-<
What I'm thinking now to fix the problem you found is that doing data
validataion in the text/var/char input functions, rather than tweaking
the mb functions. If corrupted MB string was found, then call
elog(ERROR) to abort the transation. Will appear in 7.0.1 unless
someone objects.
--
Tatsuo Ishii