Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Bulkloading using COPY - ignore duplicates?
Date
Msg-id Pine.LNX.4.30.0112131714310.647-100000@peter.localdomain
Whole thread Raw
In response to Bulkloading using COPY - ignore duplicates?  (Lee Kindness <lkindness@csl.co.uk>)
List pgsql-hackers
Lee Kindness writes:

>  1. Performance enhancements when doing doing bulk inserts - pre or
> post processing the data to remove duplicates is very time
> consuming. Likewise the best tool should always be used for the job at
> and, and for searching/removing things it's a database.

Arguably, a better tool for this is sort(1).  For instance, if you have a
typical copy input file with tab-separated fields and the primary key is
in columns 1 and 2, you can remove duplicates with

sort -k 1,2 -u INFILE > OUTFILE

To get a record of what duplicates were removed, use diff.

-- 
Peter Eisentraut   peter_e@gmx.net



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Bulkloading using COPY - ignore duplicates?
Next
From: "Ross J. Reedstrom"
Date:
Subject: Re: Bulkloading using COPY - ignore duplicates?