Thread: Deleting one of 2 identical records
Hi:
If I have a table that has 2 records which are identical with regard to all their column values, is there a way to delete one of them, leaving one remaining? Is there some unique record_id key of some sort I can use for somethign like this?
Thanks in Advance!
On 6 September 2011 18:39, Gauthier, Dave <dave.gauthier@intel.com> wrote:
Hi:
If I have a table that has 2 records which are identical with regard to all their column values, is there a way to delete one of them, leaving one remaining? Is there some unique record_id key of some sort I can use for somethign like this?
Thanks in Advance!
Yes, identify them by their ctid value.
So get the ctids by running:
SELECT ctid, *
FROM my_table
WHERE <clause to identify duplicate rows>
You will see entries which look like "(7296,11)".
You can then delete the row by referencing it in the DELETE statement. For example:
DELETE FROM my_table
WHERE ctid = '(7296,11)';
It's a shame we don't have a LIMIT on the DELETE clause (looks at hackers).
--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 9/6/2011 12:39 PM, Gauthier, Dave wrote: > Hi: > > If I have a table that has 2 records which are identical with regard to > all their column values, is there a way to delete one of them, leaving > one remaining? Is there some unique record_id key of some sort I can use > for somethign like this? > > Thanks in Advance! > Not easily that I know of. I have two thoughts: 1) create table junk (like orig); insert into junk select distinct from orig; delete from orig where exists(select from junk); insert into orig select * from junk; 2) alter table orig add uid integer; create sequence bob; update orig set uid = nextval('bob'); drop sequence bob; -- magic to delet using uid Ah, Thom just answered. I like his better, but I'll post this just for completeness... -Andy
On 9/6/2011 12:44 PM, Thom Brown wrote: > On 6 September 2011 18:39, Gauthier, Dave <dave.gauthier@intel.com > <mailto:dave.gauthier@intel.com>> wrote: > > Hi:____ > > __ __ > > If I have a table that has 2 records which are identical with regard > to all their column values, is there a way to delete one of them, > leaving one remaining? Is there some unique record_id key of some > sort I can use for somethign like this?____ > > __ __ > > Thanks in Advance!____ > > > Yes, identify them by their ctid value. > > So get the ctids by running: > > SELECT ctid, * > FROM my_table > WHERE <clause to identify duplicate rows> > > You will see entries which look like "(7296,11)". > > You can then delete the row by referencing it in the DELETE statement. > For example: > > DELETE FROM my_table > WHERE ctid = '(7296,11)'; > > It's a shame we don't have a LIMIT on the DELETE clause (looks at hackers). > > -- > Thom Brown > Twitter: @darkixion > IRC (freenode): dark_ixion > Registered Linux user: #516935 > > EnterpriseDB UK: http://www.enterprisedb.com > The Enterprise PostgreSQL Company I wonder.. using the new writeable cte's, could you: with x ( -- id = 5 has two identical rows, but limit 1 select * from orig where id = 5 limit 1; ) delete from x; -Andy
Gauthier, Dave <dave.gauthier@intel.com> wrote: > Hi: > > > > If I have a table that has 2 records which are identical with regard to all > their column values, is there a way to delete one of them, leaving one > remaining? Is there some unique record_id key of some sort I can use for > somethign like this? Yes, use the CTID-column. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
The identification and deleting of the records using ctids seems to have worked fine. Thanks ! -----Original Message----- From: Andy Colson [mailto:andy@squeakycode.net] Sent: Tuesday, September 06, 2011 1:55 PM To: Thom Brown Cc: Gauthier, Dave; pgsql-general@postgresql.org Subject: Re: [GENERAL] Deleting one of 2 identical records On 9/6/2011 12:44 PM, Thom Brown wrote: > On 6 September 2011 18:39, Gauthier, Dave <dave.gauthier@intel.com > <mailto:dave.gauthier@intel.com>> wrote: > > Hi:____ > > __ __ > > If I have a table that has 2 records which are identical with regard > to all their column values, is there a way to delete one of them, > leaving one remaining? Is there some unique record_id key of some > sort I can use for somethign like this?____ > > __ __ > > Thanks in Advance!____ > > > Yes, identify them by their ctid value. > > So get the ctids by running: > > SELECT ctid, * > FROM my_table > WHERE <clause to identify duplicate rows> > > You will see entries which look like "(7296,11)". > > You can then delete the row by referencing it in the DELETE statement. > For example: > > DELETE FROM my_table > WHERE ctid = '(7296,11)'; > > It's a shame we don't have a LIMIT on the DELETE clause (looks at hackers). > > -- > Thom Brown > Twitter: @darkixion > IRC (freenode): dark_ixion > Registered Linux user: #516935 > > EnterpriseDB UK: http://www.enterprisedb.com > The Enterprise PostgreSQL Company I wonder.. using the new writeable cte's, could you: with x ( -- id = 5 has two identical rows, but limit 1 select * from orig where id = 5 limit 1; ) delete from x; -Andy
-----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Andy Colson Sent: Tuesday, September 06, 2011 1:55 PM To: Thom Brown Cc: Gauthier, Dave; pgsql-general@postgresql.org Subject: Re: [GENERAL] Deleting one of 2 identical records I wonder.. using the new writeable cte's, could you: with x ( -- id = 5 has two identical rows, but limit 1 select * from orig where id = 5 limit 1; ) delete from x; -Andy ---------------------------------------------------------- By my understanding it is a writeable CTE because the statement inside the CTE can now be INSERT/UPDATE/DELETE (in addition to the SELECT - read only - statement). A CTE is, in some ways, like an immediately materialized view. Any reference to it does not affect the source tables; thus your example likely would not work. It isn't a RULE where "x" is simply an alias for "orig". The real problem is not the language but the table design. The idea of "true duplicates" is generally problematic but when it does occur it is advisable to introduce some kind of artificial key/sequence to allow for direct selection of the row without resorting to internals. Then, it is simply to use the full power of the SELECT statement (with Window functions and CTEs) to identify the rows that are to be deleted and feed the PK from those rows into the DELETE's WHERE clause using a sub-query. David J.
On 6 September 2011 19:00, Gauthier, Dave <dave.gauthier@intel.com> wrote:
The identification and deleting of the records using ctids seems to have worked fine.
Thanks !
Alternative you could do something like this:
WITH keep AS (
SELECT
my_table AS duplicate_row,
min(ctid) AS keep,
count(*)
FROM my_table
GROUP BY my_table
HAVING count(*) > 1
)
DELETE FROM my_table
USING keep
WHERE
my_table = keep.duplicate_row
AND
my_table.ctid != keep
RETURNING my_table.ctid, my_table.*;
This would delete all duplicate rows from the table and just keep whichever row appears first in the table before its duplicates.
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hello Thom.
what is the meaning of
select table_name from table_name
Also is this a common behavior of all Databases i.e. oracle , Microsoft ,...etc . i.e is this is the standard behavior
I think this is a good way to find duplicates in general, I will write a routine to compare all the columns by excluding the primary key which is serial
Thanks in advance
Regards
From: Thom Brown <thom@linux.com>
To: "Gauthier, Dave" <dave.gauthier@intel.com>
Cc: Andy Colson <andy@squeakycode.net>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Wednesday, September 7, 2011 12:50 AM
Subject: Re: [GENERAL] Deleting one of 2 identical records
On 6 September 2011 19:00, Gauthier, Dave <dave.gauthier@intel.com> wrote:
The identification and deleting of the records using ctids seems to have worked fine.
Thanks !
Alternative you could do something like this:
WITH keep AS (
SELECT
my_table AS duplicate_row,
min(ctid) AS keep,
count(*)
FROM my_table
GROUP BY my_table
HAVING count(*) > 1
)
DELETE FROM my_table
USING keep
WHERE
my_table = keep.duplicate_row
AND
my_table.ctid != keep
RETURNING my_table.ctid, my_table.*;
This would delete all duplicate rows from the table and just keep whichever row appears first in the table before its duplicates.
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 7 September 2011 00:55, salah jubeh <s_jubeh@yahoo.com> wrote:
-- Hello Thom.what is the meaning ofselect table_name from table_nameAlso is this a common behavior of all Databases i.e. oracle , Microsoft ,...etc . i.e is this is the standard behaviorI think this is a good way to find duplicates in general, I will write a routine to compare all the columns by excluding the primary key which is serialThanks in advance
Well I just put table_name as a placeholder for the actual name of the table you wish to remove duplicates from.
No, you can't run this query on other databases, particularly because ctids are specific to PostgreSQL. Other databases will use different methods to this one, if they have any at all. With Oracle you'd probably use ROWNUM somehow, and SQL Server will likely use some awful multi-query technique involving creating temporary tables, copying distinct rows from the duplicate set to another table, deleting it from the original and copying back. Can't say for sure though since I haven't used it in quite a while.
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company