Thread: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.
BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.
From
lalbin@fhcrc.org
Date:
The following bug has been logged on the website: Bug reference: 8173 Logged by: Lloyd Albin Email address: lalbin@fhcrc.org PostgreSQL version: 9.2.4 Operating system: SUSE Linux (64-bit) Description: = During testing for our 9.2 upgrade, we found that the error messages we were expecting did not match what was given by the program. In looking over the revision notes from our current version of 9.0.12 through the 9.2.4, that we are testing, I believe that I have tracked down the issue to "Improve COPY performance by adding tuples to the heap in batches". When I looked at the patch code in http://www.postgresql.org/message-id/4E708759.40206@enterprisedb.com I found that you are inserting 1000 rows at a time. The problem is that on failure, you return either row 1000 or the last line, whichever comes first. This can be confusing as you will see in the demo code below. CREATE TABLE public.table1 ( key INTEGER, PRIMARY KEY(key) ); Create a csv file with only one column of data, numbered from 1 to 1008. Make two copies of the file and name them csv_test.csv and csv_test2.csv. Edit csv_test.csv and change the entry 1000 to 500. Edit csv_test2.csv and change the entry 900 to 500. On 9.0.12 Server COPY public.table1 FROM 'csv_test.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=3D(500) already exists. CONTEXT: COPY table1, line 1000: "500" COPY public.table1 FROM 'csv_test2.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=3D(500) already exists. CONTEXT: COPY table1, line 900: "500" Both times the context gave us the correct information. Now try the same thing on 9.2.4 Server COPY public.table1 FROM 'csv_test.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=3D(500) already exists. CONTEXT: COPY table1, line 1000: "500" COPY public.table1 FROM 'csv_test2.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=3D(500) already exists. CONTEXT: COPY table1, line 1000: "1000" As you can see, the second test returned the last line of the set of tuples being recorded not the line that actually failed. Make a copy of csv_test2.csv and name it csv_test3.csv. Edit csv_test3.csv and remove all entries after 994. COPY public.table1 FROM 'csv_test3.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=3D(500) already exists. CONTEXT: COPY table1, line 995: "" If you are writing less than 1000 lines then it will return the line after the last line with a value of "". Lloyd Albin Statistical Center for HIV/AIDS Research and Prevention (SCHARP) Vaccine and Infectious Disease Division (VIDD) Fred Hutchinson Cancer Research Center (FHCRC)
Re: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.
From
Heikki Linnakangas
Date:
On 21.05.2013 18:20, lalbin@fhcrc.org wrote: > The following bug has been logged on the website: > > Bug reference: 8173 > Logged by: Lloyd Albin > Email address: lalbin@fhcrc.org > PostgreSQL version: 9.2.4 > Operating system: SUSE Linux (64-bit) > Description: > > During testing for our 9.2 upgrade, we found that the error messages we were > expecting did not match what was given by the program. In looking over the > revision notes from our current version of 9.0.12 through the 9.2.4, that we > are testing, I believe that I have tracked down the issue to "Improve COPY > performance by adding tuples to the heap in batches". When I looked at the > patch code in > http://www.postgresql.org/message-id/4E708759.40206@enterprisedb.com I found > that you are inserting 1000 rows at a time. The problem is that on failure, > you return either row 1000 or the last line, whichever comes first. This can > be confusing as you will see in the demo code below. > > CREATE TABLE public.table1 ( > key INTEGER, > PRIMARY KEY(key) > ); > > Create a csv file with only one column of data, numbered from 1 to 1008. > > Make two copies of the file and name them csv_test.csv and csv_test2.csv. > > Edit csv_test.csv and change the entry 1000 to 500. > > Edit csv_test2.csv and change the entry 900 to 500. > > On 9.0.12 Server > > COPY public.table1 FROM 'csv_test.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 1000: "500" > > COPY public.table1 FROM 'csv_test2.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 900: "500" > > Both times the context gave us the correct information. > > Now try the same thing on 9.2.4 Server > > COPY public.table1 FROM 'csv_test.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 1000: "500" > > COPY public.table1 FROM 'csv_test2.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 1000: "1000" > > As you can see, the second test returned the last line of the set of tuples > being recorded not the line that actually failed. > > Make a copy of csv_test2.csv and name it csv_test3.csv. > Edit csv_test3.csv and remove all entries after 994. > > COPY public.table1 FROM 'csv_test3.csv'; > > ERROR: duplicate key value violates unique constraint "table1_pkey" > DETAIL: Key (key)=(500) already exists. > CONTEXT: COPY table1, line 995: "" > > If you are writing less than 1000 lines then it will return the line after > the last line with a value of "". Hmm, yeah, it's quite self-evident what's happening; the server reports the last line that was *read*, no the line where the error happened. Committed a fix for this. Unfortunately we only keep the last line read buffered in text format, so after this you'll only get the line number, not the content of that line: postgres=# copy foo from '/tmp/foo'; ERROR: duplicate key value violates unique constraint "foo_pkey" DETAIL: Key (id)=(4500) already exists. CONTEXT: COPY foo, line 4500 Thanks for the report! - Heikki