Thread: BUG #7611: \copy (and COPY?) incorrectly parses nul character for windows-1252
BUG #7611: \copy (and COPY?) incorrectly parses nul character for windows-1252
From
sams.james+postgres@gmail.com
Date:
The following bug has been logged on the website: Bug reference: 7611 Logged by: James Email address: sams.james+postgres@gmail.com PostgreSQL version: 9.1.6 Operating system: Ubuntu Linux 12.04 Description: = I have a file with several nul characters in it. The file itself appears to be encoded as windows-1252, though I am not 100% certain of that. I do know that other software (e.g. Python) can decode the data as windows-1252 without issue. Postgres's \copy, however, chokes on the nul byte: ERROR: unterminated CSV quoted field CONTEXT: COPY promo_nonactive_load_fake, line 239900 Note that the error is wrong, the field is quoted but postgres seems to jump forward in the file when it encounters the nul bytes. Further, the line number is wrong. That is the length of the file (in lines), not the line on which the error occurs, which is several hundred lines before this. Deleting the nul byte characters allowed copy to proceed normally. I experienced similar issues with psycopg2 and copy_expert using COPY FROM STDIN and this file.
Re: BUG #7611: \copy (and COPY?) incorrectly parses nul character for windows-1252
From
Tom Lane
Date:
sams.james+postgres@gmail.com writes: > I have a file with several nul characters in it. The file itself appears to > be encoded as windows-1252, though I am not 100% certain of that. I do know > that other software (e.g. Python) can decode the data as windows-1252 > without issue. Postgres's \copy, however, chokes on the nul byte: > ERROR: unterminated CSV quoted field > CONTEXT: COPY promo_nonactive_load_fake, line 239900 Postgres doesn't support nul characters in data, so the best you could hope for here is an error message anyway. It looks to me like the immediate cause of this is that \copy reads the file with fgets() which will effectively ignore the rest of the line after a nul byte. But there are probably more issues downstream. regards, tom lane