Re: Removing duplicate entries - Mailing list pgsql-novice

From Michael Fuhr
Subject Re: Removing duplicate entries
Date
Msg-id 20060112165933.GA496@winnie.fuhr.org
Whole thread Raw
In response to Removing duplicate entries  ("Scott Ford" <Scott.Ford@bullfrogpower.com>)
List pgsql-novice
[Please copy the mailing list on replies.]

On Thu, Jan 12, 2006 at 10:24:10AM -0500, Scott Ford wrote:
> That is exactly the result that I'm looking for.  But won't that
> actually remove the documents with doc_type_id where there is only one
> for a given customer too?

No, it shouldn't (see below).

> Maybe I need to read up on my GROUP BY clause.
>
> I thought that:
>
> SELECT min(doc_id)
>   FROM doc
>   WHERE doc_type_id = 2 -- not necessary but probably more efficient
>   GROUP BY cust_id, doc_type_id
>
> would select one instance of doc_type_id = 2 from each customer if there
> was one, and would select the one with the lowest doc_id if there was
> more then one.

Right.  This subquery generates the list of doc_id's *not* to delete
(you could use max instead of min; the point is to choose exactly
one out of the set).  The WHERE clause in the subquery isn't necessary
because the DELETE already restricts the affected records to those
with doc_type_id 2; however, adding the restriction to the subquery
generates a smaller list of doc_id's to check against, which might
make the delete faster.  In other words, the delete statement says
"delete all records with doc_type_id 2 except for these", where
"these" is the list generated by the subquery.  "These" can contain
but doesn't need to contain doc_id's for documents with a doc_type_id
other than 2 because we've already said they're not eligible for
deletion.

> So then wouldn't the row:
>
> doc_id | cust_id | doc_type_id
> --------+---------+-------------
>       9 |       2 |           2
>
> get removed too?

No, as the example I posted shows (the output came from sample data
and actually running the queries shown).

> I'll test this out some test data in the meant time.

Please do: it's important that you understand how the delete works
and be satisfied that it *does* work.  As I mentioned before, I
might be misunderstanding your requirements or making unwarranted
assumptions about your data; ultimately it's up to you to determine
whether the delete works or not.

--
Michael Fuhr

pgsql-novice by date:

Previous
From: Guido Barosio
Date:
Subject: Re: A question about pages. Still not clear
Next
From: "Scott Ford"
Date:
Subject: Re: Removing duplicate entries