Thread: Bug in psql COPY command

Bug in psql COPY command

From
"Joshua Jackson"
Date:
============================================================================

POSTGRESQL BUG REPORT TEMPLATE

============================================================================

Your name :  Joshua Jackson
Your email address : jjackson@vortech.net


System Configuration
---------------------

Architecture (example: Intel Pentium) : Intel PIII
Operating System (example: Linux 2.0.26 ELF) : Red Hat 6.1 / Linux 2.2.14
PostgreSQL version (example: PostgreSQL-6.5.1): PostgreSQL 7.0RC1
Compiler used (example: gcc 2.8.0) : egcs-2.91.66

Please enter a FULL description of your problem:
------------------------------------------------

When using the "COPY" command to load a table in psql, if the row contains
an ASCII NULL (0x0) or the "\" character, it will ignore whatever the field
terminator for the column that this character is contained in.  The result
of which is all of the columns being shifted to the left (which results is a
failed insert of a given row).

Please describe a way to repeat the problem. Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------
A text file containing the following (each field seperated by tabs)

test1    test2    test3\    test4    1234    test5

to be copied into a table such as:

create table test(
    f1    varchar(10),
    f2    varchar(10),
    f3    varchar(10),
    f4    varchar(10),
    f5    integer,
    f6    varchar(10)
);

will fail with the error:
ERROR:    copy:    line 1, Bad integer input format 'test5'

The same result will be achieved if test3 was followed by an ASCII 0x0

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------
I assume that this has to due with C string termination and escape sequences
not being properly parsed.

Re: Bug in psql COPY command

From
Tom Lane
Date:
"Joshua Jackson" <jjackson@vortech.net> writes:
> A text file containing the following (each field seperated by tabs)
> test1    test2    test3\    test4    1234    test5
> to be copied into a table such as:
> create table test(
>     f1    varchar(10),
>     f2    varchar(10),
>     f3    varchar(10),
>     f4    varchar(10),
>     f5    integer,
>     f6    varchar(10)
> );
> will fail with the error:
> ERROR:    copy:    line 1, Bad integer input format 'test5'

This is not a bug.  Backslash is a quoting character as far as COPY
is concerned, so what you have written says that "test3<tab>test4"
is the data for f3 --- and then of course f4 gets "1234" and f5
gets "test5".

You'd need to write \\ in order to put an actual backslash into f3.

> The same result will be achieved if test3 was followed by an ASCII 0x0

Hmm.  That might be a bug --- COPY uses strchr() to decide whether a
character is a field terminator or not, and so a null will always look
like a terminator.  Net effect is that f3 gets "test3", f4 gets an
empty string (since the tab terminates it), f5 gets "test4".  I assume
you actually saw "Bad integer input format 'test4'" in this case?

I find it hard to get very excited about that, however, since none of
Postgres's input/output conversion routines are designed to handle
embedded null characters.  You certainly shouldn't expect to store
an embedded null in a varchar.  You could get it past COPY by writing
either \000 or \<null>, but the type-specific input routine isn't
going to take it.

            regards, tom lane