Re: Bulkloading using COPY - ignore duplicates? - Mailing list pgsql-hackers

From Lee Kindness
Subject Re: Bulkloading using COPY - ignore duplicates?
Date
Msg-id 15391.5578.336203.295826@elsick.csl.co.uk
Whole thread Raw
In response to Re: Bulkloading using COPY - ignore duplicates?  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: Bulkloading using COPY - ignore duplicates?
List pgsql-hackers
Peter Eisentraut writes:> Lee Kindness writes:> > Consider SELECT DISTINCT - which is the 'duplicate' and which one is>
>the good one?> It's not the same thing.  SELECT DISTINCT only eliminates rows that are> completely the same, not only
equalin their unique contraints.> Maybe you're thinking of SELECT DISTINCT ON ().  Observe the big warning> that the
resultof that statement are random unless ORDER BY is used.  --> But that's not the same thing either.  We've never
claimedthat the COPY> input has an ordering assumption.  In fact you're asking for a bit more> than an ordering
assumption,you're saying that the earlier data is better> than the later data.  I think in a random use case that is
morelikely> *not* to be the case because the data at the end is newer.
 

You're right - I was meaning 'SELECT DISTINCT ON ()'. However I'm only
using it as an example of where the database is choosing (be it
randomly) the data to discarded. While I've said in this thread that
'COPY FROM IGNORE DUPLICATES' would ignore later duplicates I'm not
really that concerned about what it ignores; first, later, random,
... I agree if it was of concern then it should be pre-processed.
> Btw., here's another concern about this proposed feature:  If I do> a client-side COPY, how will you sent the
"ignored"rows back to> the client?
 

Again a number of different ideas have been mixed up in the
discussion. Oracle's logging option was only given as an example of
how other database systems deal with this option - If it wasn't
explicitly given then it's reasonable to discard the extra
information.

What really would be nice in the SQL-world is a standardised COPY
statement...

Best regards, Lee Kindness.


pgsql-hackers by date:

Previous
From: Christoph Haller
Date:
Subject: Re: ODBC on OSX
Next
From: Jayaraj Oorath
Date:
Subject: Scheduling Jobs in Postgres