Re: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times. - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times. |
Date | |
Msg-id | 519E03F3.70306@vmware.com Whole thread Raw |
In response to | BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times. (lalbin@fhcrc.org) |
List | pgsql-bugs |
On 21.05.2013 18:20, lalbin@fhcrc.org wrote: > The following bug has been logged on the website: > > Bug reference: 8173 > Logged by: Lloyd Albin > Email address: lalbin@fhcrc.org > PostgreSQL version: 9.2.4 > Operating system: SUSE Linux (64-bit) > Description: > > During testing for our 9.2 upgrade, we found that the error messages we were > expecting did not match what was given by the program. In looking over the > revision notes from our current version of 9.0.12 through the 9.2.4, that we > are testing, I believe that I have tracked down the issue to "Improve COPY > performance by adding tuples to the heap in batches". When I looked at the > patch code in > http://www.postgresql.org/message-id/4E708759.40206@enterprisedb.com I found > that you are inserting 1000 rows at a time. The problem is that on failure, > you return either row 1000 or the last line, whichever comes first. This can > be confusing as you will see in the demo code below. > > CREATE TABLE public.table1 ( > key INTEGER, > PRIMARY KEY(key) > ); > > Create a csv file with only one column of data, numbered from 1 to 1008. > > Make two copies of the file and name them csv_test.csv and csv_test2.csv. > > Edit csv_test.csv and change the entry 1000 to 500. > > Edit csv_test2.csv and change the entry 900 to 500. > > On 9.0.12 Server > > COPY public.table1 FROM 'csv_test.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 1000: "500" > > COPY public.table1 FROM 'csv_test2.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 900: "500" > > Both times the context gave us the correct information. > > Now try the same thing on 9.2.4 Server > > COPY public.table1 FROM 'csv_test.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 1000: "500" > > COPY public.table1 FROM 'csv_test2.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 1000: "1000" > > As you can see, the second test returned the last line of the set of tuples > being recorded not the line that actually failed. > > Make a copy of csv_test2.csv and name it csv_test3.csv. > Edit csv_test3.csv and remove all entries after 994. > > COPY public.table1 FROM 'csv_test3.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 995: "" > > If you are writing less than 1000 lines then it will return the line after > the last line with a value of "". Hmm, yeah, it's quite self-evident what's happening; the server reports the last line that was *read*, no the line where the error happened. Committed a fix for this. Unfortunately we only keep the last line read buffered in text format, so after this you'll only get the line number, not the content of that line: postgres=# copy foo from '/tmp/foo'; ERROR: duplicate key value violates unique constraint "foo_pkey" DETAIL: Key (id)=(4500) already exists. CONTEXT: COPY foo, line 4500 Thanks for the report! - Heikki
pgsql-bugs by date: