Re: Removing duplicate entries - Mailing list pgsql-novice

From Michael Fuhr
Subject Re: Removing duplicate entries
Date
Msg-id 20060112044830.GA90783@winnie.fuhr.org
Whole thread Raw
In response to Re: Removing duplicate entries  ("Scott Ford" <Scott.Ford@bullfrogpower.com>)
List pgsql-novice
On Wed, Jan 11, 2006 at 04:21:45PM -0500, Scott Ford wrote:
> customer
>     customer_id (pk)
>     ...
>
> documentation
>     documentation_id (pk)
>     customer_id (fk)
>     document_type_id (fk)
>     accepted
>
> document_types
>     document_type_id (pk)
>     document_name
>     ...
>
> What I want to do is remove duplicate entries for the same
> customer_id/document_type_id for only a certain document_type_id.

Will a modified version of my earlier query work?  For example,
suppose I start with this data:

SELECT * FROM doc ORDER BY cust_id, doc_type_id, doc_id;
 doc_id | cust_id | doc_type_id
--------+---------+-------------
      1 |       1 |           1
      2 |       1 |           1
      3 |       1 |           2
      4 |       1 |           2
      5 |       1 |           2
      6 |       1 |           3
      7 |       1 |           3
      8 |       2 |           1
      9 |       2 |           2
     10 |       2 |           3
(10 rows)

Then I run this delete to remove duplicates for doc_type_id 2:

DELETE FROM doc
WHERE doc_type_id = 2 AND doc_id NOT IN (
  SELECT min(doc_id)
  FROM doc
  WHERE doc_type_id = 2 -- not necessary but probably more efficient
  GROUP BY cust_id, doc_type_id
);

I end up with this:

SELECT * FROM doc ORDER BY cust_id, doc_type_id, doc_id;
 doc_id | cust_id | doc_type_id
--------+---------+-------------
      1 |       1 |           1
      2 |       1 |           1
      3 |       1 |           2
      6 |       1 |           3
      7 |       1 |           3
      8 |       2 |           1
      9 |       2 |           2
     10 |       2 |           3
(8 rows)

cust_id 1's duplicates for doc_type_id 2 have been removed (doc_id
4 and 5) but cust_id 1's duplicates for doc_type_id 1 and 3 remain.
cust_id 2 had no duplicates; all of its records remain.

Is that what you're looking for?  If not then please post some
sample data and describe exactly which records you want to delete.
It would be helpful if you post the example as CREATE TABLE and
INSERT statements that people can load into their own database.

--
Michael Fuhr

pgsql-novice by date:

Previous
From: Tom Lane
Date:
Subject: Re: Sequential Scans On Complex Query With UNION - see why this fails
Next
From: Brendan Duddridge
Date:
Subject: Re: Comparing databases