Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Bulkloading using COPY - ignore duplicates?
Date
Msg-id 200201022109.g02L9aW27520@candle.pha.pa.us
Whole thread Raw
In response to Re: Bulkloading using COPY - ignore duplicates?  (Lee Kindness <lkindness@csl.co.uk>)
Responses Re: Bulkloading using COPY - ignore duplicates?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Lee Kindness wrote:
> Tom Lane writes:
>  > Lee Kindness <lkindness@csl.co.uk> writes:
>  > > In an ideal world 'COPY FROM' would only be used with data output by
>  > > 'COPY TO' and it would be nice and sanitised. However in some fields
>  > > this often is not a possibility due to performance constraints!
>  > Of course, the more bells and whistles we add to COPY, the slower it
>  > will get, which rather defeats the purpose no?
> 
> Indeed, but as I've mentioned in this thread in the past, the code
> path for COPY FROM already does a check against the unique index (if
> there is one) but bombs-out rather than handling it...
> 
> It wouldn't add any execution time if there were no duplicates in the
> input!

I know many purists object to allowing COPY to discard invalid rows in
COPY input, but it seems we have lots of requests for this feature, with
few workarounds except pre-processing the flat file.  Of course, if they
use INSERT, they will get errors that they can just ignore.  I don't see
how allowing errors in COPY is any more illegal, except that COPY is one
command while multiple INSERTs are separate commands.

Seems we need to allow such a capability, if only crudely.  I don't
think we can create a discard file because of the problem with remote
COPY.

I think we can allow something like:
COPY FROM '/tmp/x' WITH ERRORS 2

meaning we will allow at most two errors and will report the error line
numbers to the user.  I think this syntax clearly indicates that errors
are being accepted in the input.  An alternate syntax would allow an
unlimited number of errors:
COPY FROM '/tmp/x' WITH ERRORS

The errors can be non-unique errors, or even CHECK constraint errors.

Unless I hear complaints, I will add it to TODO.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: problems with new vacuum (??)
Next
From: Laurette Cisneros
Date:
Subject: bug in join?