Re: COPY enhancements - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: COPY enhancements
Date
Msg-id 1254911874.26302.213.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: COPY enhancements  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: COPY enhancements  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-hackers
On Wed, 2009-10-07 at 03:17 -0400, Greg Smith wrote:
> On Mon, 5 Oct 2009, Josh Berkus wrote:
> 
>  Also, presumbly, if you abort a COPY because of errors, you
> > probably want to keep the errors around for later analysis.  No?
> 
> Absolutely, that's the whole point of logging to a file in the first 
> place. 

Yes, essential.

(Not read patch, so some later comments may misunderstand where we're
at)

>  What needs to happen here is that when one is aborted, you need to 
> make sure that fact is logged, and with enough information (the pid?) to 
> tie it to the COPY that failed.  Then someone can crawl the logs to figure 
> out what happened and what data did and didn't get loaded.  Ideally you'd 
> want to have that as database table information instead, to allow 
> automated recovery and re-commit in the cases where the error wasn't 
> inherent in the data but instead some other type of database failure.

Adding something to the logs is the wrong place for that IMHO. If we do
write a log message it should be a NOTICE not a LOG.

If the user specifies an error file then it should be straightforward
for them to look directly in that error file afterwards to see if there
are rejects. It's typical to only create the error file if rejects
exist, to allow a simple if-file-exists test after the COPY. 

I don't recommend having the server automatically generate an error file
name because that will encourage conflicts between multiple users on
production systems. If the user wants an errorfile they should specify
it, that way they will know the name.

It will be best to have the ability to have a specific rejection reason
for each row rejected. That way we will be able to tell the difference
between uniqueness violation errors, invalid date format on col7, value
fails check constraint on col22 etc.. 

An implicit "I got an error on this record" isn't enough information and
can lead to chaos, since you may be getting more than one error on a
record and we need to be able to fix them one at a time. If we can't
tell what the error message is then we might think our first valid fix
actually failed and start looking for other solutions.

There are security implications of writing stuff to an error file, so
will the error file option still be available when we do \copy or COPY
FROM STDIN, and if so how will it work? With what restrictions?

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Review of "SQLDA support for ECPG"
Next
From: Robert Haas
Date:
Subject: Re: COPY enhancements