On Mon, Jul 18, 2016 at 2:16 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
I assume the first problem could be solved by implementing your own Reader that "massages" each line while it reads the source file.
But I don't think there is an efficient way to "catch" the bad rows through the Copy API.
I've done exactly this. A long time ago, but so long as you don't mind redoing the entire COPY without the bad rows, it works just fine. If you get lots of failures in any given batch, you'll be better off doing individual inserts in large transactions and catching the failures there, or writing a more sophisticated reader which catches the bad rows and prevents them from even being attempted.
But you might be interested in http://pgloader.io/ which is a tool similar to SQL*Loader it supports more formats then COPY does. I have not worked with it however
Is Kettle still around (java-based ETL tool)? It always had its difficulties, in my experience, but for basic bulk inserts with redirection of bad rows to a log, it works very well. Not the fastest to execute relative to implementing a custom COPY reader, but if you aren't already dealing with overloaded infrastructure and don't need to absolutely maximize efficiency, it will get you up and running really quickly (once you are familiar with the tool and over the learning curve, of course) and isn't database-specific, so much more portable. I found it really easy to write Java/Groovy/jython code which could instantiate the kettle tools and run them in the context of other tools/scripts, making it quite modular. It was all a long time ago or I'd provide more detail. Sorry. And none of that should discredit the pgloader.io suggestion. I just have no experience of it.