Re: Removing duplicate records from a bulk upload (rationale behind selecting a method) - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Date
Msg-id CAOR=d=1jF7t1LKnAknrpSnXr_jF-MvVv6M0mT3paWdRob+5z_A@mail.gmail.com
Whole thread Raw
In response to Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)  (Andy Colson <andy@squeakycode.net>)
Responses Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
List pgsql-general
If you're de-duping a whole table, no need to create indexes, as it's
gonna have to hit every row anyway. Fastest way I've found has been:

select a,b,c into newtable from oldtable group by a,b,c;

On pass, done.

If you want to use less than the whole row, you can use select
distinct on (col1, col2) * into newtable from oldtable;


pgsql-general by date:

Previous
From: Andy Colson
Date:
Subject: Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Next
From: Tom Lane
Date:
Subject: Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)