Thread: Removing duplicate keys and updating deleted entry key in other table
hello group: I have a table that has 19 duplicte entries: table - sequence: seq_id | seq_refseq | seq_gname --------+--------------+----------------- 33014 | NM_025196 | GRPEL1 33015 | NM_007186 | CEP2 33016 | NM_018148 | LINS1 33017 | NM_199166 | ALAS1 ..... I suspected there are some duplicate entries: >select seq_refseq,count(*) as score from sequence group by seq_refseq having count(*) > 1; seq_refseq | score --------------+------- NM_033421 | 2 NM_018290 | 2 ..................... NM_005311 | 2 (19 rows) > select * from sequence where seq_refseq ='NM_033421'; seq_id | seq_refseq | seq_gname --------+------------+----------- 43535 | NM_033421 | C20orf161 43554 | NM_033421 | C20orf161 (2 rows) Now that 19 records are duplicated, I want to delete the duplicated records from other table comb: Table comb: >select * from comb; cid | gid | seq_id --------+-------+-------- 85830 | 5116 | 33014 85831 | 22191 | 33014 85832 | 22186 | 33014 ....................... for some i checked to see if any records holds the duplicated key. i found none for 3 or 4 cases. select * from comb where comb.seq_id = 43539; cid | gid | seq_id -----+-----+-------- (0 rows) my question is how do I delete the duplicate row and make sure I update the comb table after I delete the duplicate key. I never did this before and is a complex problem for me to code. could any one please help me. thanks sri __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Removing duplicate keys and updating deleted entry key in other table
From
Richard Broersma Jr
Date:
> I have a table that has 19 duplicte entries: > table - sequence: > seq_id | seq_refseq | seq_gname > --------+--------------+----------------- > 33014 | NM_025196 | GRPEL1 > 33015 | NM_007186 | CEP2 > 33016 | NM_018148 | LINS1 > 33017 | NM_199166 | ALAS1 > ..... > I suspected there are some duplicate entries: > >select seq_refseq,count(*) as score from sequence > group by seq_refseq having count(*) > 1; > seq_refseq | score > --------------+------- > NM_033421 | 2 > NM_018290 | 2 > ..................... > NM_005311 | 2 > (19 rows) > > select * from sequence where > seq_refseq ='NM_033421'; > seq_id | seq_refseq | seq_gname > --------+------------+----------- > 43535 | NM_033421 | C20orf161 > 43554 | NM_033421 | C20orf161 > (2 rows) > Now that 19 records are duplicated, I want to delete > the duplicated records from other table comb: > Table comb: > >select * from comb; > cid | gid | seq_id > --------+-------+-------- > 85830 | 5116 | 33014 > 85831 | 22191 | 33014 > 85832 | 22186 | 33014 > ....................... > for some i checked to see if any records holds the > duplicated key. i found none for 3 or 4 cases. > select * from comb where comb.seq_id = 43539; > cid | gid | seq_id > -----+-----+-------- > (0 rows) > my question is how do I delete the duplicate row and > make sure I update the comb table after I delete the > duplicate key. > I never did this before and is a complex problem for > me to code. I tried to come up with the sql on my own, but then I remembered a similar email that already had a good answer. http://archives.postgresql.org/pgsql-novice/2006-06/msg00092.php Regards, Richard Broersma Jr.
hi, thanks for pointing out the email. I have been trying to open the link and seems that postgresql archive site is down. would you please send me the content of the link that you sent earlier. I appreciate your help. thanks sri --- Richard Broersma Jr <rabroersma@yahoo.com> wrote: > > I have a table that has 19 duplicte entries: > > table - sequence: > > seq_id | seq_refseq | seq_gname > > --------+--------------+----------------- > > 33014 | NM_025196 | GRPEL1 > > 33015 | NM_007186 | CEP2 > > 33016 | NM_018148 | LINS1 > > 33017 | NM_199166 | ALAS1 > > ..... > > I suspected there are some duplicate entries: > > >select seq_refseq,count(*) as score from sequence > > group by seq_refseq having count(*) > 1; > > seq_refseq | score > > --------------+------- > > NM_033421 | 2 > > NM_018290 | 2 > > ..................... > > NM_005311 | 2 > > (19 rows) > > > select * from sequence where > > seq_refseq ='NM_033421'; > > seq_id | seq_refseq | seq_gname > > --------+------------+----------- > > 43535 | NM_033421 | C20orf161 > > 43554 | NM_033421 | C20orf161 > > (2 rows) > > Now that 19 records are duplicated, I want to > delete > > the duplicated records from other table comb: > > Table comb: > > >select * from comb; > > cid | gid | seq_id > > --------+-------+-------- > > 85830 | 5116 | 33014 > > 85831 | 22191 | 33014 > > 85832 | 22186 | 33014 > > ....................... > > for some i checked to see if any records holds the > > duplicated key. i found none for 3 or 4 cases. > > select * from comb where comb.seq_id = 43539; > > cid | gid | seq_id > > -----+-----+-------- > > (0 rows) > > my question is how do I delete the duplicate row > and > > make sure I update the comb table after I delete > the > > duplicate key. > > I never did this before and is a complex problem > for > > me to code. > > I tried to come up with the sql on my own, but then > I remembered a similar email that already had > a good answer. > http://archives.postgresql.org/pgsql-novice/2006-06/msg00092.php > > > Regards, > > Richard Broersma Jr. > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
> I have been trying to open the link and seems that > postgresql archive site is down. > http://archives.postgresql.org/pgsql-novice/2006-06/msg00092.php Sorry, the link didn't work. It worked yesterday when I found it. I will also forward this to the general list maybe someone there could take a look at it. Regards, Richard Broersma Jr.
fixed now: * From: <kynn ( at ) panix ( dot ) com> * To: pgsql-novice ( at ) postgresql ( dot ) org * Subject: SQL for removing duplicates? * Date: Tue, 13 Jun 2006 12:11:41 -0400 (EDT) Hi. I'm stumped. I have a large table (about 8.5M records), let's call it t, whose columns include x and y. I want to remove records from this table so that any pair of values for these two fields appear only once. (This will get rid of about 15% of the records in t.) One simple solution would be something like CREATE TABLE tmp AS SELECT DISTINCT ON ( x, y ) * FROM t; DROP TABLE t; ALTER TABLE tmp RENAME TO t; This works, but it uses a lot of space. I would prefer to simply cull the unwanted records from t, but I just can't figure out the SQL for it. Any help with it would be *much* appreciated. Thanks! On Wed, Sep 06, 2006 at 08:00:54AM -0700, Richard Broersma Jr wrote: > > I have been trying to open the link and seems that > > postgresql archive site is down. > > http://archives.postgresql.org/pgsql-novice/2006-06/msg00092.php > > Sorry, the link didn't work. It worked yesterday when I found it. > > I will also forward this to the general list maybe someone there could take a look at it. > > Regards, > > Richard Broersma Jr. > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match --