Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

From Lee Kindness
Subject Re: Bulkloading using COPY - ignore duplicates?
Date
Msg-id 15288.30097.74590.206271@elsick.csl.co.uk
Whole thread Raw
In response to Re: Bulkloading using COPY - ignore duplicates?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bulkloading using COPY - ignore duplicates?
List pgsql-hackers
Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > Would this seem a reasonable thing to do? Does anyone
relyon COPY> > FROM causing an ERROR on duplicate input?> Yes.  This change will not be acceptable unless it's made an
optional>(and not default, IMHO, though perhaps that's negotiable) feature of> COPY.
 

I see where you're coming from, but seriously what's the use/point of
COPY aborting and doing a rollback if one duplicate key is found? I
think it's quite reasonable to presume the input to COPY has had as
little processing done on it as possible. I could loop through the
input file before sending it to COPY but that's just wasting cycles
and effort - Postgres has btree lookup built in, I don't want to roll
my own before giving Postgres my input file!
> The implementation might be rather messy too.  I don't much care> for the notion of a routine as low-level as
bt_check_uniqueknowing> that the context is or is not COPY.  We might have to do some> restructuring.
 

Well in reality it wouldn't be "you're getting run from copy" but
rather "notice on duplicate, rather than error & exit". There is a
telling comment in nbtinsert.c just before _bt_check_unique() is
called:
/* * If we're not allowing duplicates, make sure the key isn't already * in the index.  XXX this belongs somewhere
else,likely */
 

So perhaps dupes should be searched for before _bt_doinsert is called,
or somewhere more appropriate?
> > Would:> >  WITH ON_DUPLICATE = CONTINUE|TERMINATE (or similar)> > need to be added to the COPY command (I hope
not)?>It occurs to me that skip-the-insert might be a useful option for> INSERTs that detect a unique-key conflict, not
onlyfor COPY.  (Cf.> the regular discussions we see on whether to do INSERT first or> UPDATE first when the key might
alreadyexist.)  Maybe a SET variable> that applies to all forms of insertion would be appropriate.
 

That makes quite a bit of sense.

-- Lee Kindness, Senior Software EngineerConcept Systems Limited.


pgsql-hackers by date:

Previous
From: Justin Clift
Date:
Subject: Re: Spinlock performance improvement proposal
Next
From: Tom Lane
Date:
Subject: Re: Bulkloading using COPY - ignore duplicates?