It seems that COPY currently is able to return first error line and error type (extra or missing columns, type parse error, etc).
Thus, the approach similar to the Stas wrote should work and, being optimised for a small number of error rows, should not
affect COPY performance in such case.
I will be glad to receive any critical remarks and suggestions.
I've following questions about your proposal.
1. Suppose we have to insert N records 2. We create subtransaction with these N records 3. Error is raised on k-th line 4. Then, we can safely insert all lines from 1st and till (k - 1)
5. Report, save to errors table or silently drop k-th line 6. Next, try to insert lines from (k + 1) till N with another subtransaction 7. Repeat until the end of file
Do you assume that we start new subtransaction in 4 since subtransaction we started in 2 is rolled back?
I am planning to use background worker processes for parallel COPY execution. Each process will receive equal piece of the input file. Since file is splitted by size not by lines, each worker will start import from the first new line to do not hit a broken line.
I think that situation when backend is directly reading file during COPY is not typical. More typical case is \copy psql command. In that case "COPY ... FROM stdin;" is actually executed while psql is streaming the data.