Re: [HACKERS] fatal copy in/out error (6.5.3) - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: [HACKERS] fatal copy in/out error (6.5.3)
Date
Msg-id 20000126102351B.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: [HACKERS] fatal copy in/out error (6.5.3)  (Michael Robinson <robinson@netrinsics.com>)
List pgsql-hackers
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >Yes, it's not a PostgreSQL's business but is a really big problem in
> >the real world. Maybe some HTML gurus might have good suggestions on
> >these issues (something like using a language tag?)
> 
> The only solution is defensive programming.  Even if there were a standard
> that everyone followed, if malicious people could break things by not
> following the standard, then you can be certain that somebody would do so.

Defensive programming saves the system but does not user. Once
corrupted data is stored in the system, it's totally useless for the
user anyway.  What about validating data *before* inserting it into a
table? You expect EUC_CN data, and it should be possible to determine
if the data is valid or not by doing some simple checking in most
cases. Maybe I could provide a new libpq function something like:
bool pg_validate_euc_cn(const unsigned char *euc_str);

If it returns false, then euc_str is not a valid EUC_CN.
So you show a message:
"Sorry, but we only accepts EUC_CN data. Please try another input
method..." or jump to other pages for EUC_TW or Big5 or whatever...

Of course the function does not guarantee the string is 100% correct
EUC_CN (on the other hand it can tell that the string is not
valid) because:

1) there are chances that, for example, a EUC_CN string and a EUC_JP
string has same bit patterns accidently.

2) I do not have enough information to implement it perfectly. At this
point I could only perform minimal checking. However, it can be good
a start point for someone who has more knowledge (on the other hand, I
could implement pg_validate_euc_jp in much better way, since I have
precise info for EUC_JP).

> >Here it is. With this patch, copy out should be happy even with the
> >wrong data. I'm not sure if it could be displayed correctly, though.
> 
> Thank you very much.  However, I think even this is too optimistic:
> 
> >!     if (*s & 0x80)
> 
> Shouldn't it be something like:
> 
>     if ((*s & 0x80) && (*(s+1) & 0x80))
> 
> Even though "\242\242\242\0" is an invalid EUC sequence, it still shouldn't be
> allowed to break the software.

Thanks for the suggestion. More robust code is always good.
--
Tatsuo Ishii



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] --enable-debug
Next
From: Thomas Lockhart
Date:
Subject: Re: [HACKERS] Re: Happy column adding and dropping