Thread: Delete duplicates

Delete duplicates

From
"Rudi Starcevic"
Date:

Hi,

I have a table with duplicates and trouble with my SQL.
I'd like to keep a single record and remove older duplicates.
For example below of the 6 recods I'd like to keep records
4 and 6.

TABLE: aapid |     keyword
----+----------------- 1 | LEAGUE PANTHERS 2 | LEAGUE PANTHERS 3 | LEAGUE PANTHERS 4 | LEAGUE PANTHERS 5 | LEAGUE
BRONCOS6 | LEAGUE BRONCOS
 

Here is my SQL so far, it will select records 1 to 5 instead
of 1,2,3 and 5 only.

Any help greatly appreciated. I think I need a Group By somewhere in there.

select a1.id
from aap a1
where id < ( SELECT max(id) FROM aap AS a2 )
AND EXISTS
(
SELECT *
FROM aap AS a2
WHERE a1.keyword = a2.keyword
)

Regards
Rudi.


Re: Delete duplicates

From
Ian Barwick
Date:
On Sunday 22 June 2003 11:15, Rudi Starcevic wrote:
> Hi,
>
> I have a table with duplicates and trouble with my SQL.

(...)

> select a1.id
> from aap a1
> where id < ( SELECT max(id) FROM aap AS a2 )
> AND EXISTS
> (
> SELECT *
> FROM aap AS a2
> WHERE a1.keyword = a2.keyword
> )

How about (untested):

SELECT a1.id FROM aap a1WHERE id = (SELECT MAX(id) FROM aap a2             WHERE a2.keyword = a1.keyword)


Ian Barwick
barwick@gmx.net




Re: Delete duplicates

From
Paul Thomas
Date:
On 22/06/2003 10:15 Rudi Starcevic wrote:
> 
> 
> Hi,
> 
> I have a table with duplicates and trouble with my SQL.
> I'd like to keep a single record and remove older duplicates.
> For example below of the 6 recods I'd like to keep records
> 4 and 6.
> 
> TABLE: aap
>  id |     keyword
> ----+-----------------
>   1 | LEAGUE PANTHERS
>   2 | LEAGUE PANTHERS
>   3 | LEAGUE PANTHERS
>   4 | LEAGUE PANTHERS
>   5 | LEAGUE BRONCOS
>   6 | LEAGUE BRONCOS
> 
> Here is my SQL so far, it will select records 1 to 5 instead
> of 1,2,3 and 5 only.
> 
> Any help greatly appreciated. I think I need a Group By somewhere in
> there.
> 
> select a1.id
> from aap a1
> where id < ( SELECT max(id) FROM aap AS a2 )
> AND EXISTS
> (
> SELECT *
> FROM aap AS a2
> WHERE a1.keyword = a2.keyword
> )

I just tries this with 7.3.3:

select max(id), keyword from aap where keyword in (select distinct keyword 
from aap) group by keyword;
 max |     keyword
-----------------------   6 | LEAGUE BRONCOS   4 | LEAGUE PANTHERS
(2 rows)

HTH

-- 
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants         | 
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+


Re: Delete duplicates

From
Germán Gutiérrez
Date:
Hi,

you need find duplicates and then you remove them

delete  from aap where id not in ( select max(id) from aap b where
aap.keyword = b.keyword );

Germán
Sorry about my english

-----Mensaje original-----
De: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] En nombre de Rudi Starcevic
Enviado el: Domingo, 22 de Junio de 2003 5:15
Para: pgsql-sql@postgresql.org
Asunto: [SQL] Delete duplicates



Hi,

I have a table with duplicates and trouble with my SQL.
I'd like to keep a single record and remove older duplicates.
For example below of the 6 recods I'd like to keep records
4 and 6.

TABLE: aapid |     keyword
----+----------------- 1 | LEAGUE PANTHERS 2 | LEAGUE PANTHERS 3 | LEAGUE PANTHERS 4 | LEAGUE PANTHERS 5 | LEAGUE
BRONCOS6 | LEAGUE BRONCOS 

Here is my SQL so far, it will select records 1 to 5 instead
of 1,2,3 and 5 only.

Any help greatly appreciated. I think I need a Group By somewhere in
there.

select a1.id
from aap a1
where id < ( SELECT max(id) FROM aap AS a2 )
AND EXISTS
(
SELECT *
FROM aap AS a2
WHERE a1.keyword = a2.keyword
)

Regards
Rudi.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Delete duplicates

From
"Franco Bruno Borghesi"
Date:
try this

DELETE FROM aap WHERE  id NOT IN (  SELECT max(id)  FROM aap  GROUP BY keyword
);

>
>
> Hi,
>
> I have a table with duplicates and trouble with my SQL.
> I'd like to keep a single record and remove older duplicates.
> For example below of the 6 recods I'd like to keep records
> 4 and 6.
>
> TABLE: aap
> id |     keyword
> ----+-----------------
>  1 | LEAGUE PANTHERS
>  2 | LEAGUE PANTHERS
>  3 | LEAGUE PANTHERS
>  4 | LEAGUE PANTHERS
>  5 | LEAGUE BRONCOS
>  6 | LEAGUE BRONCOS
>
> Here is my SQL so far, it will select records 1 to 5 instead
> of 1,2,3 and 5 only.
>
> Any help greatly appreciated. I think I need a Group By somewhere in
> there.
>
> select a1.id
> from aap a1
> where id < ( SELECT max(id) FROM aap AS a2 )
> AND EXISTS
> (
> SELECT *
> FROM aap AS a2
> WHERE a1.keyword = a2.keyword
> )
>
> Regards
> Rudi.
>
> ---------------------------(end of
> broadcast)--------------------------- TIP 1: subscribe and unsubscribe
> commands go to majordomo@postgresql.org





Re: Delete duplicates

From
"Denis Arh"
Date:
How to delete "real" duplicates?

id | somthing
-----------------------
1 | aaa
1 | aaa
2 | bbb
2 | bbb

(an accident with backup recovery...)



Regards,
Denis Arh

----- Original Message ----- 
From: "Franco Bruno Borghesi" <franco@akyasociados.com.ar>
To: <rudi@oasis.net.au>
Cc: <pgsql-sql@postgresql.org>
Sent: Sunday, June 22, 2003 11:17 PM
Subject: Re: [SQL] Delete duplicates


> try this
> 
> DELETE FROM aap WHERE  id NOT IN (
>    SELECT max(id)
>    FROM aap
>    GROUP BY keyword
> );
> 
> >
> >
> > Hi,
> >
> > I have a table with duplicates and trouble with my SQL.
> > I'd like to keep a single record and remove older duplicates.
> > For example below of the 6 recods I'd like to keep records
> > 4 and 6.
> >
> > TABLE: aap
> > id |     keyword
> > ----+-----------------
> >  1 | LEAGUE PANTHERS
> >  2 | LEAGUE PANTHERS
> >  3 | LEAGUE PANTHERS
> >  4 | LEAGUE PANTHERS
> >  5 | LEAGUE BRONCOS
> >  6 | LEAGUE BRONCOS
> >
> > Here is my SQL so far, it will select records 1 to 5 instead
> > of 1,2,3 and 5 only.
> >
> > Any help greatly appreciated. I think I need a Group By somewhere in
> > there.
> >
> > select a1.id
> > from aap a1
> > where id < ( SELECT max(id) FROM aap AS a2 )
> > AND EXISTS
> > (
> > SELECT *
> > FROM aap AS a2
> > WHERE a1.keyword = a2.keyword
> > )
> >
> > Regards
> > Rudi.
> >
> > ---------------------------(end of
> > broadcast)--------------------------- TIP 1: subscribe and unsubscribe
> > commands go to majordomo@postgresql.org
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 
> 


Re: Delete duplicates

From
Sean Chittenden
Date:
> How to delete "real" duplicates?
> 
> id | somthing
> -----------------------
> 1 | aaa
> 1 | aaa
> 2 | bbb
> 2 | bbb
> 
> (an accident with backup recovery...)

I'm not 100% on some of the syntax off the top of my head, but:

BEGIN;
ALTER TABLE orig_table RENAME TO backup_table;
CREATE TABLE orig_table AS SELECT id,something FROM backup_table GROUP BY id, something;
-- Create any indexes on orig_table that need to be recreated
DROP TABLE orig_table;
COMMIT;

This isn't for the faint of heart: be sure to do this inside of a
transaction or on a backup db until you're 100% good to go.  -sc
-- 
Sean Chittenden


Re: Delete duplicates

From
Tom Lane
Date:
"Denis Arh" <denis@exonium.net> writes:
> How to delete "real" duplicates?

Use the OID or CTID system columns.
        regards, tom lane


Re: Delete duplicates

From
Rudi Starcevic
Date:
Hi,

Would this be OK or a little crude (untested) :

INSERT INTO new_table ( id, something )
SELECT
DISTINCT ON (id)
id,
something
FROM old_table
ORDER BY id

Or something similar but create a new table ?

Cheers
Rudi.




Denis Arh wrote:

>How to delete "real" duplicates?
>
>id | somthing
>-----------------------
>1 | aaa
>1 | aaa
>2 | bbb
>2 | bbb
>
>(an accident with backup recovery...)
>
>
>
>Regards,
>Denis Arh
>  
>



Re: Delete duplicates

From
Dani Oderbolz
Date:
Denis Arh wrote:

>How to delete "real" duplicates?
>
>id | somthing
>-----------------------
>1 | aaa
>1 | aaa
>2 | bbb
>2 | bbb
>
>(an accident with backup recovery...)
>
In these cases, its certainly the best to rebuild your table
using a

CREATE TABLE new AS
SELECT col1,col1..
FROM old
GROUPY BY col1,col2...

If you have got many duplicates, recreating a new table
semms better to me than deleteing.
(Another advantage is that you still have the "bad" table as a record).

Cheers,
Dani