Thread: Deleting one of 2 identical records

Deleting one of 2 identical records

From

"Gauthier, Dave"

Date:

06 September 2011, 14:39:34

Hi:

If I have a table that has 2 records which are identical with regard to all their column values, is there a way to delete one of them, leaving one remaining? Is there some unique record_id key of some sort I can use for somethign like this?

Thanks in Advance!

Re: Deleting one of 2 identical records

From

Thom Brown

Date:

06 September 2011, 14:45:54

On 6 September 2011 18:39, Gauthier, Dave <dave.gauthier@intel.com> wrote:

Hi:

If I have a table that has 2 records which are identical with regard to all their column values, is there a way to delete one of them, leaving one remaining? Is there some unique record_id key of some sort I can use for somethign like this?

Thanks in Advance!

Yes, identify them by their ctid value.

So get the ctids by running:

SELECT ctid, *

FROM my_table

WHERE <clause to identify duplicate rows>

You will see entries which look like "(7296,11)".

You can then delete the row by referencing it in the DELETE statement. For example:

DELETE FROM my_table

WHERE ctid = '(7296,11)';

It's a shame we don't have a LIMIT on the DELETE clause (looks at hackers).

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Deleting one of 2 identical records

From

Andy Colson

Date:

06 September 2011, 14:52:34

On 9/6/2011 12:39 PM, Gauthier, Dave wrote:
> Hi:
>
> If I have a table that has 2 records which are identical with regard to
> all their column values, is there a way to delete one of them, leaving
> one remaining? Is there some unique record_id key of some sort I can use
> for somethign like this?
>
> Thanks in Advance!
>

Not easily that I know of.  I have two thoughts:

1)
create table junk (like orig);
insert into junk select distinct from orig;
delete from orig where exists(select from junk);
insert into orig select * from junk;

2)
alter table orig add uid integer;
create sequence bob;
update orig set uid = nextval('bob');
drop sequence bob;
-- magic to delet using uid

Ah, Thom just answered.  I like his better, but I'll post this just for
completeness...

-Andy

Re: Deleting one of 2 identical records

From

Andy Colson

Date:

06 September 2011, 14:55:13

On 9/6/2011 12:44 PM, Thom Brown wrote:
> On 6 September 2011 18:39, Gauthier, Dave <dave.gauthier@intel.com
> <mailto:dave.gauthier@intel.com>> wrote:
>
>     Hi:____
>
>     __ __
>
>     If I have a table that has 2 records which are identical with regard
>     to all their column values, is there a way to delete one of them,
>     leaving one remaining?  Is there some unique record_id key of some
>     sort I can use for somethign like this?____
>
>     __ __
>
>     Thanks in Advance!____
>
>
> Yes, identify them by their ctid value.
>
> So get the ctids by running:
>
> SELECT ctid, *
> FROM my_table
> WHERE <clause to identify duplicate rows>
>
> You will see entries which look like "(7296,11)".
>
> You can then delete the row by referencing it in the DELETE statement.
>   For example:
>
> DELETE FROM my_table
> WHERE ctid = '(7296,11)';
>
> It's a shame we don't have a LIMIT on the DELETE clause (looks at hackers).
>
> --
> Thom Brown
> Twitter: @darkixion
> IRC (freenode): dark_ixion
> Registered Linux user: #516935
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

I wonder.. using the new writeable cte's, could you:

with x (
   -- id = 5 has two identical rows, but limit 1
   select * from orig where id = 5 limit 1;
)
delete from x;

-Andy

Re: Deleting one of 2 identical records

From

Andreas Kretschmer

Date:

06 September 2011, 14:57:45

Gauthier, Dave <dave.gauthier@intel.com> wrote:

> Hi:
>
>
>
> If I have a table that has 2 records which are identical with regard to all
> their column values, is there a way to delete one of them, leaving one
> remaining?  Is there some unique record_id key of some sort I can use for
> somethign like this?

Yes, use the CTID-column.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Deleting one of 2 identical records

From

"Gauthier, Dave"

Date:

06 September 2011, 15:00:35

The identification and deleting of the records using ctids seems to have worked fine.
Thanks !

-----Original Message-----
From: Andy Colson [mailto:andy@squeakycode.net]
Sent: Tuesday, September 06, 2011 1:55 PM
To: Thom Brown
Cc: Gauthier, Dave; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Deleting one of 2 identical records

On 9/6/2011 12:44 PM, Thom Brown wrote:
> On 6 September 2011 18:39, Gauthier, Dave <dave.gauthier@intel.com
> <mailto:dave.gauthier@intel.com>> wrote:
>
>     Hi:____
>
>     __ __
>
>     If I have a table that has 2 records which are identical with regard
>     to all their column values, is there a way to delete one of them,
>     leaving one remaining?  Is there some unique record_id key of some
>     sort I can use for somethign like this?____
>
>     __ __
>
>     Thanks in Advance!____
>
>
> Yes, identify them by their ctid value.
>
> So get the ctids by running:
>
> SELECT ctid, *
> FROM my_table
> WHERE <clause to identify duplicate rows>
>
> You will see entries which look like "(7296,11)".
>
> You can then delete the row by referencing it in the DELETE statement.
>   For example:
>
> DELETE FROM my_table
> WHERE ctid = '(7296,11)';
>
> It's a shame we don't have a LIMIT on the DELETE clause (looks at hackers).
>
> --
> Thom Brown
> Twitter: @darkixion
> IRC (freenode): dark_ixion
> Registered Linux user: #516935
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

I wonder.. using the new writeable cte's, could you:

with x (
   -- id = 5 has two identical rows, but limit 1
   select * from orig where id = 5 limit 1;
)
delete from x;

-Andy

Re: Deleting one of 2 identical records

From

"David Johnston"

Date:

06 September 2011, 19:01:58

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Andy Colson
Sent: Tuesday, September 06, 2011 1:55 PM
To: Thom Brown
Cc: Gauthier, Dave; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Deleting one of 2 identical records

I wonder.. using the new writeable cte's, could you:

with x (
   -- id = 5 has two identical rows, but limit 1
   select * from orig where id = 5 limit 1;
)
delete from x;

-Andy

----------------------------------------------------------

By my understanding it is a writeable CTE because the statement inside the
CTE can now be INSERT/UPDATE/DELETE (in addition to the SELECT - read only -
statement).

A CTE is, in some ways, like an immediately materialized view. Any reference
to it does not affect the source tables; thus your example likely would not
work.  It isn't a RULE where "x" is simply an alias for "orig".

The real problem is not the language but the table design.  The idea of
"true duplicates" is generally problematic but when it does occur it is
advisable to introduce some kind of artificial key/sequence to allow for
direct selection of the row without resorting to internals.  Then, it is
simply to use the full power of the SELECT statement (with Window functions
and CTEs) to identify the rows that are to be deleted and feed the PK from
those rows into the DELETE's WHERE clause using a sub-query.

David J.

Re: Deleting one of 2 identical records

From

Thom Brown

Date:

06 September 2011, 19:52:08

On 6 September 2011 19:00, Gauthier, Dave <dave.gauthier@intel.com> wrote:

The identification and deleting of the records using ctids seems to have worked fine.
Thanks !

Alternative you could do something like this:

WITH keep AS (

SELECT

my_table AS duplicate_row,

min(ctid) AS keep,

count(*)

FROM my_table

GROUP BY my_table

HAVING count(*) > 1

)

DELETE FROM my_table

USING keep

WHERE

my_table = keep.duplicate_row

AND

my_table.ctid != keep

RETURNING my_table.ctid, my_table.*;

This would delete all duplicate rows from the table and just keep whichever row appears first in the table before its duplicates.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Deleting one of 2 identical records

From

salah jubeh

Date:

06 September 2011, 20:55:56

Hello Thom.

what is the meaning of

select table_name from table_name

Also is this a common behavior of all Databases i.e. oracle , Microsoft ,...etc . i.e is this is the standard behavior

I think this is a good way to find duplicates in general, I will write a routine to compare all the columns by excluding the primary key which is serial

Thanks in advance

Regards

From: Thom Brown <thom@linux.com>
To: "Gauthier, Dave" <dave.gauthier@intel.com>
Cc: Andy Colson <andy@squeakycode.net>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Wednesday, September 7, 2011 12:50 AM
Subject: Re: [GENERAL] Deleting one of 2 identical records

On 6 September 2011 19:00, Gauthier, Dave <dave.gauthier@intel.com> wrote:

The identification and deleting of the records using ctids seems to have worked fine.
Thanks !

Alternative you could do something like this:

WITH keep AS (

SELECT

my_table AS duplicate_row,

min(ctid) AS keep,

count(*)

FROM my_table

GROUP BY my_table

HAVING count(*) > 1

)

DELETE FROM my_table

USING keep

WHERE

my_table = keep.duplicate_row

AND

my_table.ctid != keep

RETURNING my_table.ctid, my_table.*;

This would delete all duplicate rows from the table and just keep whichever row appears first in the table before its duplicates.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Deleting one of 2 identical records

From

Thom Brown

Date:

06 September 2011, 21:07:09

On 7 September 2011 00:55, salah jubeh <s_jubeh@yahoo.com> wrote:

Hello Thom.

what is the meaning of

select table_name from table_name

Also is this a common behavior of all Databases i.e. oracle , Microsoft ,...etc . i.e is this is the standard behavior

I think this is a good way to find duplicates in general, I will write a routine to compare all the columns by excluding the primary key which is serial

Thanks in advance

Well I just put table_name as a placeholder for the actual name of the table you wish to remove duplicates from.

No, you can't run this query on other databases, particularly because ctids are specific to PostgreSQL. Other databases will use different methods to this one, if they have any at all. With Oracle you'd probably use ROWNUM somehow, and SQL Server will likely use some awful multi-query technique involving creating temporary tables, copying distinct rows from the duplicate set to another table, deleting it from the original and copying back. Can't say for sure though since I haven't used it in quite a while.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company