BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times. - Mailing list pgsql-bugs

From lalbin@fhcrc.org
Subject BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.
Date
Msg-id E1UeuvK-00022u-87@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      8173
Logged by:          Lloyd Albin
Email address:      lalbin@fhcrc.org
PostgreSQL version: 9.2.4
Operating system:   SUSE Linux (64-bit)
Description:        =


During testing for our 9.2 upgrade, we found that the error messages we were
expecting did not match what was given by the program. In looking over the
revision notes from our current version of 9.0.12 through the 9.2.4, that we
are testing, I believe that I have tracked down the issue to "Improve COPY
performance by adding tuples to the heap in batches". When I looked at the
patch code in
http://www.postgresql.org/message-id/4E708759.40206@enterprisedb.com I found
that you are inserting 1000 rows at a time. The problem is that on failure,
you return either row 1000 or the last line, whichever comes first. This can
be confusing as you will see in the demo code below.

CREATE TABLE public.table1 (
  key INTEGER,
  PRIMARY KEY(key)
);

Create a csv file with only one column of data, numbered from 1 to 1008.

Make two copies of the file and name them csv_test.csv and csv_test2.csv.

Edit csv_test.csv and change the entry 1000 to 500.

Edit csv_test2.csv and change the entry 900 to 500.

On 9.0.12 Server

COPY public.table1 FROM 'csv_test.csv';

ERROR:  duplicate key value violates unique constraint "table1_pkey"
DETAIL:  Key (key)=3D(500) already exists.
CONTEXT:  COPY table1, line 1000: "500"

COPY public.table1 FROM 'csv_test2.csv';

ERROR:  duplicate key value violates unique constraint "table1_pkey"
DETAIL:  Key (key)=3D(500) already exists.
CONTEXT:  COPY table1, line 900: "500"

Both times the context gave us the correct information.

Now try the same thing on 9.2.4 Server

COPY public.table1 FROM 'csv_test.csv';

ERROR:  duplicate key value violates unique constraint "table1_pkey"
DETAIL:  Key (key)=3D(500) already exists.
CONTEXT:  COPY table1, line 1000: "500"

COPY public.table1 FROM 'csv_test2.csv';

ERROR:  duplicate key value violates unique constraint "table1_pkey"
DETAIL:  Key (key)=3D(500) already exists.
CONTEXT:  COPY table1, line 1000: "1000"

As you can see, the second test returned the last line of the set of tuples
being recorded not the line that actually failed.

Make a copy of csv_test2.csv and name it csv_test3.csv.
Edit csv_test3.csv and remove all entries after 994.

COPY public.table1 FROM 'csv_test3.csv';

ERROR:  duplicate key value violates unique constraint "table1_pkey"
DETAIL:  Key (key)=3D(500) already exists.
CONTEXT:  COPY table1, line 995: ""

If you are writing less than 1000 lines then it will return the line after
the last line with a value of "".

Lloyd Albin
Statistical Center for HIV/AIDS Research and Prevention (SCHARP)
Vaccine and Infectious Disease Division (VIDD)
Fred Hutchinson Cancer Research Center (FHCRC)

pgsql-bugs by date:

Previous
From: dan.libby@gmail.com
Date:
Subject: BUG #8175: Check constraint fails for valid data. ( rounding related? )
Next
From: juancho gonzila jorrel
Date:
Subject: pass to install