Jeff Davis <pgsql@j-davis.com> writes:
> The documentation says:
> "The specified null string is sent by COPY TO without adding any
> backslashes; conversely, COPY FROM matches the input against the null
> string before removing backslashes."
> -- http://www.postgresql.org/docs/devel/static/sql-copy.html
> Those seem inconsistent with the following behavior:
> postgres=# copy foo to '/tmp/a.copy' null '\0';
> COPY 2
> postgres=# copy foo from '/tmp/a.copy' null '\0';
> ERROR: invalid byte sequence for encoding "UTF8": 0x00
> CONTEXT: COPY foo, line 2: "\0"
> $ cat /tmp/a.copy
> 1
> \0
> COPY TO seems to follow the documentation, inserting the null string
> without modification into the output file. COPY FROM seems to de-escape
> the input before trying to match it against the null string, leading to
> the invalid byte sequence.
This seems to have worked as per the documentation before 8.4. I guess
it got broken in one of the efficiency-driven rewrites of the COPY
logic. Specifically, the issue seems to be that CopyReadAttributesText
tries to combine de-escaping with locating the end-of-field, so it does
all the de-escaping --- and the ensuing encoding validity check ---
before checking to see if the input matches the defined null string.
I'm pretty sure that the original code first broke the line into fields,
then checked the null marker, then did de-escaping.
Fortunately, it seems pretty easy to fix: we just have to reverse the
order of the null-marker check and the pg_verifymbstr call. However,
this is dependent on the parsing loop never having a reason to throw
error, which at the very least is something that needs a comment.
regards, tom lane