Home > mailing lists

Re: SQL for removing duplicates? - Mailing list pgsql-novice

From	Brad Nicholson
Subject	Re: SQL for removing duplicates?
Date	June 13, 2006 15:26:23
Msg-id	448F033C.1020708@ca.afilias.info Whole thread Raw
In response to	SQL for removing duplicates? (<kynn@panix.com>)
List	pgsql-novice

Tree view

kynn@panix.com wrote:
> Hi.  I'm stumped.  I have a large table (about 8.5M records), let's
> call it t, whose columns include x and y.  I want to remove records
> from this table so that any pair of values for these two fields appear
> only once.  (This will get rid of about 15% of the records in t.)
>
> One simple solution would be something like
>
>   CREATE TABLE tmp AS SELECT DISTINCT ON ( x, y ) * FROM t;
>   DROP TABLE t;
>   ALTER TABLE tmp RENAME TO t;
>
> This works, but it uses a lot of space.  I would prefer to simply cull
> the unwanted records from t, but I just can't figure out the SQL for
> it.  Any help with it would be *much* appreciated.


If your table is created with OIDs, this should work.  If not add a
unique column to the table and use that in place of oid.

DELETE FROM t where oid IN (select t2.oid from t t2 EXCEPT SELECT
max(t3.oid) from t t3 group by t3.x, t3.y);

Also note, the query plan for this is going to be very ugly, it might
very well be cheaper to use the solution that you initially mentioned.






--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

pgsql-novice by date:

From:
Date: 13 June 2006, 13:12:15
Subject: SQL for removing duplicates?

From:
Date: 13 June 2006, 19:22:47
Subject: Re: SQL for removing duplicates?

Re: SQL for removing duplicates? - Mailing list pgsql-novice

Previous

Next