Re: COPY fails on 8.1 with invalid byte sequences in text - Mailing list pgsql-bugs

From Jeff Davis
Subject Re: COPY fails on 8.1 with invalid byte sequences in text
Date
Msg-id 1162585522.31124.286.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: COPY fails on 8.1 with invalid byte sequences in text  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Tue, 2006-10-31 at 23:18 -0500, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > Is this not a bug?
>
> I don't actually see that it is.  The documentation is perfectly clear
> on the point:
>
>     (It is your responsibility that the byte sequences you create
>     are valid characters in the server character set encoding.)
>
> (This is in 4.1.2.1. String Constants)  If you don't want to deal with
> this, don't use octal escapes to construct multibyte characters.
>

I have thought about this some more, and I still disagree on a few
points.

First, there is no way, as a DBA, to _not_ use octal escapes, because
there is no option to turn off. I can either inspect the application,
or just trust it, but I can't control it from the database (short of
adding a CHECK to every text column). Since it interferes with the
proper use of COPY, which in turn interferes with the use of pg_dump and
applications such as Slony, I think that it is something that should be
controlled at database.

Second, you pointed out that my patch breaks BYTEA. I'd like to point
out that (as I understand it) the patch only breaks BYTEA if you rely on
the cstring escaping to pass binary data to byteain, like:

=> SELECT E'\377'::bytea;

In my opinion, that's wrong anyway, because it will fail if you do
something like:

=> SELECT E'\377\000\377';

Because the NULL in the middle terminates the cstring, which passes a 1-
byte string to byteain, resulting in a 1-byte value, instead of a 3-byte
value which it should be.

However, if you pass ASCII data to byteain that represents binary data,
like:

=> SELECT E'\\377\\000\\377'::bytea;

It seems to work just fine, and that's what PQescapeByteaConn() does.
There may be a backwards-compatibility issue, but I don't think my patch
is 100% broken (unless, of course, you found some other way that it's
broken).

This is a practical concern because some applications use escaping that
protects against SQL injection, but does not protect against invalid
byte sequences. It's not obvious to an application programmer that
protecting against SQL injection is not enough, particularly if the
programmer is trying to be "database agnostic" or doesn't test against
different encodings. Ideally, they should use PQescapeStringConn() and
not convert anything to an octal escape, but that's not always the case.

Lastly, if we retain the current behavior, I think a note should be
added to the COPY docs. The current docs imply that if you COPY it out,
you can COPY it back in. Even if I read the above documentation note
directly after reading the COPY documentation, it would not make me
think that COPY could fail for built-in types.

Regards,
    Jeff Davis

pgsql-bugs by date:

Previous
From: Dennis Bjorklund
Date:
Subject: Re: COPY fails on 8.1 with invalid byte sequences in text
Next
From: Rusty Conover
Date:
Subject: Operator Classes and ANALYZE