Thread: Deleting all but one row of a list of non-uniques
Hi all I have a table, for simplicity's sake containing one field, called unid. for example, select unid, oid from table gives me something like this: unid | oid ---------+--------- XNO24ORK | 40276607 XNPGJDPU | 40276673 * XNPGJDPU | 40276674 * XNXAAVQ2 | 40277583 ZAZAFAA4 | 40276600 ZAZV5UG4 | 40276446 ZD66A1LL | 40277162 ZDXZ27RS | 40277454 ZEKRT3GM | 40277739 * ZEKRT3GM | 40277740 * ZEKRT3GM | 40277741 * (I've marked the rows with duplicated unids with the * to the right) I'd like to delete rows in such a way that one (and any one) row for each unid remains, and all other duplicates are removed. Does anyone have any ideas that may help me here please? slecting distinct is not helpful as in reality there are other fields which contain data like timestamps that will differ but I don't mind which one remains, but I'd need to work with them on selects to the table afterwards. I've tried: delete from table where oid in (select p1.oid from table p1, table p2 where p1.oid != p2.oid and p1.unid=p2.unid); which only works in a few cases - I suspect where there are only two rows with the same unid. Is it even possible? Alternatively, can I get the \copy command to ignore attempts to insert duplicated rows into a UNIQUE column instead of aborting the copy? Not sure if any of the options that can be supplied to the table at creation time for unique will help here. Thanks. Ciao Zak -- ======================================================================== http://www.carfolio.com/ Searchable database of 10 000+ car specs ========================================================================
On Wed, Jun 23, 2004 at 02:17:50AM +0200, Zak McGregor wrote: > Hi all > > > I have a table, for simplicity's sake containing one field, called unid. <snip> > (I've marked the rows with duplicated unids with the * to the right) > > I'd like to delete rows in such a way that one (and any one) row for each unid > remains, and all other duplicates are removed. Does anyone have any ideas that > may help me here please? The usual scheme I use is something like: (not perfect SQL) delete from table where exists (select from table where this.id < that.id). Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Try this. Create a temp table with a list of the duplicate unid's eg create temp table duplicates as select min(oid) as oid_val, unid from <table> group by unid having count(*) > 1; Then isolate the unwanted rows update <table> set unid = null <this could be any value you choose> from duplicates where <table>.unid = duplicates.unid and oid_val <> <table>.oid Then delete them delete from <table> where unid is null Thanks Andrew -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Zak McGregor Sent: Wednesday, 23 June 2004 10:18 AM To: pgsql-general@postgresql.org Subject: [GENERAL] Deleting all but one row of a list of non-uniques Hi all I have a table, for simplicity's sake containing one field, called unid. for example, select unid, oid from table gives me something like this: unid | oid ---------+--------- XNO24ORK | 40276607 XNPGJDPU | 40276673 * XNPGJDPU | 40276674 * XNXAAVQ2 | 40277583 ZAZAFAA4 | 40276600 ZAZV5UG4 | 40276446 ZD66A1LL | 40277162 ZDXZ27RS | 40277454 ZEKRT3GM | 40277739 * ZEKRT3GM | 40277740 * ZEKRT3GM | 40277741 * (I've marked the rows with duplicated unids with the * to the right) I'd like to delete rows in such a way that one (and any one) row for each unid remains, and all other duplicates are removed. Does anyone have any ideas that may help me here please? slecting distinct is not helpful as in reality there are other fields which contain data like timestamps that will differ but I don't mind which one remains, but I'd need to work with them on selects to the table afterwards. I've tried: delete from table where oid in (select p1.oid from table p1, table p2 where p1.oid != p2.oid and p1.unid=p2.unid); which only works in a few cases - I suspect where there are only two rows with the same unid. Is it even possible? Alternatively, can I get the \copy command to ignore attempts to insert duplicated rows into a UNIQUE column instead of aborting the copy? Not sure if any of the options that can be supplied to the table at creation time for unique will help here. Thanks. Ciao Zak -- ======================================================================== http://www.carfolio.com/ Searchable database of 10 000+ car specs ======================================================================== ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match