Home > mailing lists

Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	Re: Bulkloading using COPY - ignore duplicates?
Date	December 13, 2001 13:29:59
Msg-id	Pine.LNX.4.30.0112131714310.647-100000@peter.localdomain Whole thread Raw
In response to	Bulkloading using COPY - ignore duplicates? (Lee Kindness <lkindness@csl.co.uk>)
List	pgsql-hackers

Tree view

Lee Kindness writes:

>  1. Performance enhancements when doing doing bulk inserts - pre or
> post processing the data to remove duplicates is very time
> consuming. Likewise the best tool should always be used for the job at
> and, and for searching/removing things it's a database.

Arguably, a better tool for this is sort(1).  For instance, if you have a
typical copy input file with tab-separated fields and the primary key is
in columns 1 and 2, you can remove duplicates with

sort -k 1,2 -u INFILE > OUTFILE

To get a record of what duplicates were removed, use diff.

-- 
Peter Eisentraut   peter_e@gmx.net

pgsql-hackers by date:

From: Peter Eisentraut
Date: 13 December 2001, 13:29:52
Subject: Re: Bulkloading using COPY - ignore duplicates?

From: "Ross J. Reedstrom"
Date: 13 December 2001, 13:38:38
Subject: Re: Bulkloading using COPY - ignore duplicates?

Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

Previous

Next