Re: COPY enhancements - Mailing list pgsql-hackers

From Rod Taylor
Subject Re: COPY enhancements
Date
Msg-id 751261b20910080912md3176e8ga671c768b4fee055@mail.gmail.com
Whole thread Raw
In response to Re: COPY enhancements  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: COPY enhancements  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-hackers

Yeah.  I think it's going to be hard to make this work without having
standalone transactions.  One idea would be to start a subtransaction,
insert tuples until one fails, then rollback the subtransaction and
start a new one, and continue on until the error limit is reached.

I've found performance is reasonable, for data with low numbers of errors (say 1 per 100,000 records or less) doing the following:

SAVEPOINT bulk;
Insert 1000 records using COPY.

If there is an error, rollback to bulk, and step through each line individually within its own "individual" subtransaction. All good lines are kept and bad lines are logged; client side control makes logging trivial.

The next set of 1000 records is done in bulk again.

1000 records per savepoint seems to be a good point for my data without too much time lost to overhead or too many records to retry due to a failing record. Of course, it is controlled by the client side rather than server side so reporting back broken records is trivial.


It may be possible to boost performance by:

1) Having copy remember which specific line caused the error. So it can replace lines 1 through 487 in a subtransaction since it knows those are successful. Run 488 in its on subtransaction. Run 489 through ... in a new subtransaction.
2) Increasing the number of records per subtransaction if data is clean. It wouldn't take long until you were inserting millions of records per subtransaction for a large data set. This should make the subtransaction overhead minimal. Small imports would still run slower but very large imports of clean data should be essentially the same speed in the end.

pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: COPY enhancements
Next
From: Tom Lane
Date:
Subject: Re: COPY enhancements