Home > mailing lists

Re: How to idenity duplicate rows - Mailing list pgsql-general

From	Jan Cruz
Subject	Re: How to idenity duplicate rows
Date	March 19, 2006 22:08:02
Msg-id	493da2780603191808q4759ee9dm95f6a5fb712992f3@mail.gmail.com Whole thread Raw
In response to	Re: How to idenity duplicate rows (Peter Eisentraut <peter_e@gmx.net>)
List	pgsql-general

Tree view

As for deleting all but one row in a duplicated group, you're going to
have to get at them by the oid or ctid columns perhaps.

The other idea is to run CREATE TABLE newtable AS SELECT DISTINCT * FROM
oldtable;.

I believe getting oid and/or ctid is not possible since it would not display/get duplicate records
in a "HAVING CLAUSE" since their oid/ctid wouldn't be the same.

And creating a newtable and use SELECT DISTINCT isn't an acceptable idea to the audit team when you have millions of records in a table in production server when you only need to remove let say 70 records from that table.

I tried another approach where I queried and insert the duplicate record (35 records) into
a new/temporary table. Then I created a stored procedure something like this:
--START

DECLARE
foo table;

BEGIN
       FOR foo IN
         SELECT * FROM newtable
       LOOP
          DELETE FROM oldtable
          where oldtable.field1 = foo.field1
           and   oldtable.field2 = foo.field2
           and   oldtable.field3 = foo.field3
           and   oldtable.field4 = foo.field4
           ...';
       END LOOP;
END;

-- END

Problem with this approach I got 35 duplicate records (count = 2)
from the new table and delete only 20 records from the oldtable.
If I am not mistaken it should delete 70 records.

I wonder

pgsql-general by date:

From: Peter Eisentraut
Date: 19 March 2006, 18:52:07
Subject: Re: How to idenity duplicate rows

From: Berend Tober
Date: 19 March 2006, 22:16:45
Subject: Re: How to idenity duplicate rows

Re: How to idenity duplicate rows - Mailing list pgsql-general

Previous

Next