Re: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times. - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.
Date
Msg-id 519E03F3.70306@vmware.com
Whole thread Raw
In response to BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.  (lalbin@fhcrc.org)
List pgsql-bugs
On 21.05.2013 18:20, lalbin@fhcrc.org wrote:
> The following bug has been logged on the website:
>
> Bug reference:      8173
> Logged by:          Lloyd Albin
> Email address:      lalbin@fhcrc.org
> PostgreSQL version: 9.2.4
> Operating system:   SUSE Linux (64-bit)
> Description:
>
> During testing for our 9.2 upgrade, we found that the error messages we were
> expecting did not match what was given by the program. In looking over the
> revision notes from our current version of 9.0.12 through the 9.2.4, that we
> are testing, I believe that I have tracked down the issue to "Improve COPY
> performance by adding tuples to the heap in batches". When I looked at the
> patch code in
> http://www.postgresql.org/message-id/4E708759.40206@enterprisedb.com I found
> that you are inserting 1000 rows at a time. The problem is that on failure,
> you return either row 1000 or the last line, whichever comes first. This can
> be confusing as you will see in the demo code below.
>
> CREATE TABLE public.table1 (
>    key INTEGER,
>    PRIMARY KEY(key)
> );
>
> Create a csv file with only one column of data, numbered from 1 to 1008.
>
> Make two copies of the file and name them csv_test.csv and csv_test2.csv.
>
> Edit csv_test.csv and change the entry 1000 to 500.
>
> Edit csv_test2.csv and change the entry 900 to 500.
>
> On 9.0.12 Server
>
> COPY public.table1 FROM 'csv_test.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 1000: "500"
>
> COPY public.table1 FROM 'csv_test2.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 900: "500"
>
> Both times the context gave us the correct information.
>
> Now try the same thing on 9.2.4 Server
>
> COPY public.table1 FROM 'csv_test.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 1000: "500"
>
> COPY public.table1 FROM 'csv_test2.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 1000: "1000"
>
> As you can see, the second test returned the last line of the set of tuples
> being recorded not the line that actually failed.
>
> Make a copy of csv_test2.csv and name it csv_test3.csv.
> Edit csv_test3.csv and remove all entries after 994.
>
> COPY public.table1 FROM 'csv_test3.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 995: ""
>
> If you are writing less than 1000 lines then it will return the line after
> the last line with a value of "".

Hmm, yeah, it's quite self-evident what's happening; the server reports
the last line that was *read*, no the line where the error happened.

Committed a fix for this. Unfortunately we only keep the last line read
buffered in text format, so after this you'll only get the line number,
not the content of that line:

postgres=# copy foo from '/tmp/foo';
ERROR:  duplicate key value violates unique constraint "foo_pkey"
DETAIL:  Key (id)=(4500) already exists.
CONTEXT:  COPY foo, line 4500

Thanks for the report!

- Heikki

pgsql-bugs by date:

Previous
From: John R Pierce
Date:
Subject: Re: pass to install
Next
From: Amit Kapila
Date:
Subject: Re: Odd Behavior After Multiple Deletes