On Thu, 2009-10-08 at 18:23 -0400, Bruce Momjian wrote:
> Dimitri Fontaine wrote:
> > Simon Riggs <simon@2ndQuadrant.com> writes:
> > > It will be best to have the ability to have a specific rejection reason
> > > for each row rejected. That way we will be able to tell the difference
> > > between uniqueness violation errors, invalid date format on col7, value
> > > fails check constraint on col22 etc..
> >
> > In case that helps, what pgloader does is logging into two files, named
> > after the table name (not scalable to server-side solution):
> > table.rej --- lines it could not load, straight from source file
> > table.rej.log --- errors as given by the server, plus pgloader comment
> >
> > The pgloader comment is necessary for associating each log line to the
> > source file line, as it's operating by dichotomy, the server always
> > report error on line 1.
> >
> > The idea of having two errors file could be kept though, the aim is to
> > be able to fix the setup then COPY again the table.rej file when it
> > happens the errors are not on the file content. Or for loading into
> > another table, with all columns as text or bytea, then clean data from a
> > procedure.
>
> What would be _cool_ would be to add the ability to have comments in the
> COPY files, like \#, and then the copy data lines and errors could be
> adjacent. (Because of the way we control COPY escaping, adding \# would
> not be a problem. We have \N for null, for example.)
That was my idea also until I heard Dimitri's two file approach.
Having a pristine data file and a matching error file means you can
potentially just resubmit the error file again. Often you need to do
things like trap RI errors and then resubmit them at a later time once
the master rows have entered the system.
-- Simon Riggs www.2ndQuadrant.com