Re: Practical error logging for very large COPY - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Practical error logging for very large COPY
Date
Msg-id 1132646828.4959.507.camel@localhost.localdomain
Whole thread Raw
In response to Re: Practical error logging for very large COPY statements  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Practical error logging for very large COPY  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 2005-11-21 at 19:38 -0500, Andrew Dunstan wrote:
> 
> Tom Lane wrote:
> 
> >Simon Riggs <simon@2ndquadrant.com> writes:
> >  
> >
> >>What I'd like to do is add an ERRORTABLE clause to COPY. The main
> >>problem is how we detect a duplicate row violation, yet prevent it from
> >>aborting the transaction.
> >>    
> >If this only solves the problem of duplicate keys, and not any other
> >kind of COPY error, it's not going to be much of an advance.
> >  

> Yeah, and I see errors from bad data as often as from violating 
> constraints. Maybe the best way if we do something like this would be to 
> have the error table contain a single text, or maybe bytea, field which 
> contained the raw offending input line.

I have committed the sin of omission again.

Duplicate row violation is the big challenge, but not the only function
planned. Formatting errors occur much more frequently, so yes we'd want
to log all of that too. And yes, it would be done in the way you
suggest.

Here's a fuller, but still brief sketch:

COPY ... FROM ....[ERRORTABLES format1 [uniqueness1] [ERRORLIMIT percent]]

where Format1, Uniqueness1 would be created from new by this command (or
error if they already exist)

Format1 would hold formatting errors so would be in a blob table with
cols (line number, col number, error number, fullrowstring)

Uniqueness1 would be same definition as table, but with no indexes
This table would be optional, indicating no uniqueness violation checks
would be needed to be carried out. If present and yet no unique indexes
exist, then Uniqueness1 would be ignored (and not created).

ERRORLIMIT percent would abort the COPY if more than percent errors were
found, after the first 1000 records (that limit could also be stated if
required).

Without the ERRORTABLES clause, COPY would work exactly as it does now.

How does that sound?

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: MERGE vs REPLACE
Next
From: Dennis Bjorklund
Date:
Subject: Web page down (ad server)