Re: COPY enhancements - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: COPY enhancements |
Date | |
Msg-id | 1254911874.26302.213.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: COPY enhancements (Greg Smith <gsmith@gregsmith.com>) |
Responses |
Re: COPY enhancements
|
List | pgsql-hackers |
On Wed, 2009-10-07 at 03:17 -0400, Greg Smith wrote: > On Mon, 5 Oct 2009, Josh Berkus wrote: > > Also, presumbly, if you abort a COPY because of errors, you > > probably want to keep the errors around for later analysis. No? > > Absolutely, that's the whole point of logging to a file in the first > place. Yes, essential. (Not read patch, so some later comments may misunderstand where we're at) > What needs to happen here is that when one is aborted, you need to > make sure that fact is logged, and with enough information (the pid?) to > tie it to the COPY that failed. Then someone can crawl the logs to figure > out what happened and what data did and didn't get loaded. Ideally you'd > want to have that as database table information instead, to allow > automated recovery and re-commit in the cases where the error wasn't > inherent in the data but instead some other type of database failure. Adding something to the logs is the wrong place for that IMHO. If we do write a log message it should be a NOTICE not a LOG. If the user specifies an error file then it should be straightforward for them to look directly in that error file afterwards to see if there are rejects. It's typical to only create the error file if rejects exist, to allow a simple if-file-exists test after the COPY. I don't recommend having the server automatically generate an error file name because that will encourage conflicts between multiple users on production systems. If the user wants an errorfile they should specify it, that way they will know the name. It will be best to have the ability to have a specific rejection reason for each row rejected. That way we will be able to tell the difference between uniqueness violation errors, invalid date format on col7, value fails check constraint on col22 etc.. An implicit "I got an error on this record" isn't enough information and can lead to chaos, since you may be getting more than one error on a record and we need to be able to fix them one at a time. If we can't tell what the error message is then we might think our first valid fix actually failed and start looking for other solutions. There are security implications of writing stuff to an error file, so will the error file option still be available when we do \copy or COPY FROM STDIN, and if so how will it work? With what restrictions? -- Simon Riggs www.2ndQuadrant.com
pgsql-hackers by date: