Home > mailing lists

Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

From	Lee Kindness
Subject	Re: Bulkloading using COPY - ignore duplicates?
Date	October 1, 2001 10:01:23
Msg-id	15288.30097.74590.206271@elsick.csl.co.uk Whole thread Raw
In response to	Re: Bulkloading using COPY - ignore duplicates? (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Bulkloading using COPY - ignore duplicates?
List	pgsql-hackers

Tree view

Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > Would this seem a reasonable thing to do? Does anyone
relyon COPY> > FROM causing an ERROR on duplicate input?> Yes.  This change will not be acceptable unless it's made an
optional>(and not default, IMHO, though perhaps that's negotiable) feature of> COPY.

I see where you're coming from, but seriously what's the use/point of
COPY aborting and doing a rollback if one duplicate key is found? I
think it's quite reasonable to presume the input to COPY has had as
little processing done on it as possible. I could loop through the
input file before sending it to COPY but that's just wasting cycles
and effort - Postgres has btree lookup built in, I don't want to roll
my own before giving Postgres my input file!
> The implementation might be rather messy too.  I don't much care> for the notion of a routine as low-level as
bt_check_uniqueknowing> that the context is or is not COPY.  We might have to do some> restructuring.

Well in reality it wouldn't be "you're getting run from copy" but
rather "notice on duplicate, rather than error & exit". There is a
telling comment in nbtinsert.c just before _bt_check_unique() is
called:
/* * If we're not allowing duplicates, make sure the key isn't already * in the index.  XXX this belongs somewhere
else,likely */

So perhaps dupes should be searched for before _bt_doinsert is called,
or somewhere more appropriate?
> > Would:> >  WITH ON_DUPLICATE = CONTINUE|TERMINATE (or similar)> > need to be added to the COPY command (I hope
not)?>It occurs to me that skip-the-insert might be a useful option for> INSERTs that detect a unique-key conflict, not
onlyfor COPY.  (Cf.> the regular discussions we see on whether to do INSERT first or> UPDATE first when the key might
alreadyexist.)  Maybe a SET variable> that applies to all forms of insertion would be appropriate.

That makes quite a bit of sense.

-- Lee Kindness, Senior Software EngineerConcept Systems Limited.

pgsql-hackers by date:

From: Justin Clift
Date: 01 October 2001, 09:54:07
Subject: Re: Spinlock performance improvement proposal

From: Tom Lane
Date: 01 October 2001, 10:09:03
Subject: Re: Bulkloading using COPY - ignore duplicates?

Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

Previous

Next