Home > mailing lists

Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

From	Lee Kindness
Subject	Bulkloading using COPY - ignore duplicates?
Date	December 11, 2001 14:08:03
Msg-id	15382.11982.324375.978316@elsick.csl.co.uk Whole thread Raw
In response to	Re: Bulkloading using COPY - ignore duplicates? (Lee Kindness <lkindness@csl.co.uk>)
Responses	Re: Bulkloading using COPY - ignore duplicates? (Peter Eisentraut <peter_e@gmx.net>)
List	pgsql-hackers

Tree view

Gents,

I started quite a long thread about this back in September. To
summarise I was proposing that COPY FROM would not abort the
transaction when it encountered data which would cause a uniqueness
violation on the table index(s).

Generally I think this was seen as a 'Good Thing'TM for a number of
reasons:
1. Performance enhancements when doing doing bulk inserts - pre or
post processing the data to remove duplicates is very time
consuming. Likewise the best tool should always be used for the job at
and, and for searching/removing things it's a database.

2. Feature parity with other database systems. For example Oracle's
SQLOADER has a feature to not insert duplicates and rather move
them to another file for later investigation.

Naturally the default behaviour would be the current one of assuming
valid data. Also the duplicate check would not add anything to the
current code path for COPY FROM - it would not take any longer.

I attempted to add this functionality to PostgreSQL myself but got as
far as an updated parser and a COPY FROM which resulted in a database
recovery!

So (here's the question finally) is it worthwhile adding this
enhancement to the TODO list?

Thanks, Lee.

-- Lee Kindness, Senior Software Engineer, Concept Systems Limited.http://services.csl.co.uk/ http://www.csl.co.uk/ +44
1315575595

pgsql-hackers by date:

From: Jan Wieck
Date: 11 December 2001, 13:59:13
Subject: Re: pg_dump: Sorted output, referential integrity

From: Tom Lane
Date: 11 December 2001, 14:08:19
Subject: Re: Restoring large tables with COPY

Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

Previous

Next