Thread: Bug in psql COPY command
============================================================================ POSTGRESQL BUG REPORT TEMPLATE ============================================================================ Your name : Joshua Jackson Your email address : jjackson@vortech.net System Configuration --------------------- Architecture (example: Intel Pentium) : Intel PIII Operating System (example: Linux 2.0.26 ELF) : Red Hat 6.1 / Linux 2.2.14 PostgreSQL version (example: PostgreSQL-6.5.1): PostgreSQL 7.0RC1 Compiler used (example: gcc 2.8.0) : egcs-2.91.66 Please enter a FULL description of your problem: ------------------------------------------------ When using the "COPY" command to load a table in psql, if the row contains an ASCII NULL (0x0) or the "\" character, it will ignore whatever the field terminator for the column that this character is contained in. The result of which is all of the columns being shifted to the left (which results is a failed insert of a given row). Please describe a way to repeat the problem. Please try to provide a concise reproducible example, if at all possible: ---------------------------------------------------------------------- A text file containing the following (each field seperated by tabs) test1 test2 test3\ test4 1234 test5 to be copied into a table such as: create table test( f1 varchar(10), f2 varchar(10), f3 varchar(10), f4 varchar(10), f5 integer, f6 varchar(10) ); will fail with the error: ERROR: copy: line 1, Bad integer input format 'test5' The same result will be achieved if test3 was followed by an ASCII 0x0 If you know how this problem might be fixed, list the solution below: --------------------------------------------------------------------- I assume that this has to due with C string termination and escape sequences not being properly parsed.
"Joshua Jackson" <jjackson@vortech.net> writes: > A text file containing the following (each field seperated by tabs) > test1 test2 test3\ test4 1234 test5 > to be copied into a table such as: > create table test( > f1 varchar(10), > f2 varchar(10), > f3 varchar(10), > f4 varchar(10), > f5 integer, > f6 varchar(10) > ); > will fail with the error: > ERROR: copy: line 1, Bad integer input format 'test5' This is not a bug. Backslash is a quoting character as far as COPY is concerned, so what you have written says that "test3<tab>test4" is the data for f3 --- and then of course f4 gets "1234" and f5 gets "test5". You'd need to write \\ in order to put an actual backslash into f3. > The same result will be achieved if test3 was followed by an ASCII 0x0 Hmm. That might be a bug --- COPY uses strchr() to decide whether a character is a field terminator or not, and so a null will always look like a terminator. Net effect is that f3 gets "test3", f4 gets an empty string (since the tab terminates it), f5 gets "test4". I assume you actually saw "Bad integer input format 'test4'" in this case? I find it hard to get very excited about that, however, since none of Postgres's input/output conversion routines are designed to handle embedded null characters. You certainly shouldn't expect to store an embedded null in a varchar. You could get it past COPY by writing either \000 or \<null>, but the type-specific input routine isn't going to take it. regards, tom lane