Home > mailing lists

Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

From	Hannu Krosing
Subject	Re: Bulkloading using COPY - ignore duplicates?
Date	December 13, 2001 12:08:30
Msg-id	3C18B445.3C7774D9@tm.ee Whole thread Raw
In response to	Bulkloading using COPY - ignore duplicates? (Lee Kindness <lkindness@csl.co.uk>)
Responses	Re: Bulkloading using COPY - ignore duplicates? (Lee Kindness <lkindness@csl.co.uk>)
List	pgsql-hackers

Tree view

Lee Kindness wrote:
> 
> Patrick Welche writes:
>  > On Mon, Oct 01, 2001 at 03:17:43PM +0100, Lee Kindness wrote:
>  > > Please, don't get me wrong - I don't want to come across arrogant. I'm
>  > > simply trying to improve the 'COPY FROM' command in a situation where
>  > > speed is a critical issue and the data is dirty... And that must be a
>  > > relatively common scenario.
>  > Isn't that when you do your bulk copy into into a holding table, then
>  > clean it up, and then insert into your live system?
> 
> That's what I'm currently doing as a workaround - a SELECT DISTINCT
> from a temporary table into the real table with the unique index on
> it. However this takes absolute ages - say 5 seconds for the copy
> (which is the ballpark figure I aiming toward and can achieve with
> Ingres) plus another 30ish seconds for the SELECT DISTINCT.
> 
> The majority of database systems out there handle this situation in
> one manner or another (MySQL ignores or replaces; Ingres ignores;
> Oracle ignores or logs; others...). Indeed PostgreSQL currently checks
> for duplicates in the COPY code but throws an elog(ERROR) rather than
> ignoring the row, or passing the error back up the call chain.

I guess postgresql will be able to do it once savepoints get
implemented.

> My use of PostgreSQL is very time critical, and sadly this issue alone
> may force an evaluation of Oracle's performance in this respect!

Can't you clean the duplicates _outside_ postgresql, say

cat dumpfile | sort | uniq | psql db -c 'copy mytable from stdin'

with your version of uniq.

or perhaps

psql db -c 'copy mytable to stdout' >> dumpfile
sort dumpfile | uniq | psql db -c 'copy mytable from stdin'

if you already have something in mytable.

------------
Hannu

pgsql-hackers by date:

From: Turbo Fredriksson
Date: 13 December 2001, 11:49:43
Subject: Database upgrades

From: Vince Vielhaber
Date: 13 December 2001, 12:08:32
Subject: Re: Third call for platform testing

Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

Previous

Next