Thread: Removing duplicate keys and updating deleted entry key in other table

Removing duplicate keys and updating deleted entry key in other table

From

Srinivas Iyyer

Date:

05 September 2006, 19:44:24

hello group:

I have a table that has 19 duplicte entries:

table - sequence:
 seq_id |  seq_refseq  |    seq_gname
--------+--------------+-----------------
  33014 | NM_025196    | GRPEL1
  33015 | NM_007186    | CEP2
  33016 | NM_018148    | LINS1
  33017 | NM_199166    | ALAS1
.....

I suspected there are some duplicate entries:


>select seq_refseq,count(*) as score from sequence
group by seq_refseq having count(*) > 1;
  seq_refseq  | score
--------------+-------
 NM_033421    |     2
 NM_018290    |     2
 .....................
 NM_005311    |     2
(19 rows)

> select * from sequence where
  seq_refseq ='NM_033421';

 seq_id | seq_refseq | seq_gname
--------+------------+-----------
  43535 | NM_033421  | C20orf161
  43554 | NM_033421  | C20orf161
(2 rows)

Now that 19 records are duplicated, I want to delete
the duplicated records from other table comb:

Table comb:

>select * from comb;
  cid   |  gid  | seq_id
--------+-------+--------
  85830 |  5116 |  33014
  85831 | 22191 |  33014
  85832 | 22186 |  33014
 .......................



for some i checked to see if any records holds the
duplicated key. i found none for 3 or 4 cases.

select * from comb where comb.seq_id = 43539;
 cid | gid | seq_id
-----+-----+--------
(0 rows)


my question is how do I delete the duplicate row and
make sure I update the comb table after I delete the
duplicate key.


I never did this before and is a complex problem for
me to code.


could any one please help me.

thanks
sri

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Removing duplicate keys and updating deleted entry key in other table

From

Richard Broersma Jr

Date:

05 September 2006, 21:23:29

> I have a table that has 19 duplicte entries:
> table - sequence:
>  seq_id |  seq_refseq  |    seq_gname
> --------+--------------+-----------------
>   33014 | NM_025196    | GRPEL1
>   33015 | NM_007186    | CEP2
>   33016 | NM_018148    | LINS1
>   33017 | NM_199166    | ALAS1
> .....
> I suspected there are some duplicate entries:
> >select seq_refseq,count(*) as score from sequence
> group by seq_refseq having count(*) > 1;
>   seq_refseq  | score
> --------------+-------
>  NM_033421    |     2
>  NM_018290    |     2
>  .....................
>  NM_005311    |     2
> (19 rows)
> > select * from sequence where
>   seq_refseq ='NM_033421';
>  seq_id | seq_refseq | seq_gname
> --------+------------+-----------
>   43535 | NM_033421  | C20orf161
>   43554 | NM_033421  | C20orf161
> (2 rows)
> Now that 19 records are duplicated, I want to delete
> the duplicated records from other table comb:
> Table comb:
> >select * from comb;
>   cid   |  gid  | seq_id
> --------+-------+--------
>   85830 |  5116 |  33014
>   85831 | 22191 |  33014
>   85832 | 22186 |  33014
>  .......................
> for some i checked to see if any records holds the
> duplicated key. i found none for 3 or 4 cases.
> select * from comb where comb.seq_id = 43539;
>  cid | gid | seq_id
> -----+-----+--------
> (0 rows)
> my question is how do I delete the duplicate row and
> make sure I update the comb table after I delete the
> duplicate key.
> I never did this before and is a complex problem for
> me to code.

I tried to come up with the sql on my own, but then I remembered a similar email that already had
a good answer.
http://archives.postgresql.org/pgsql-novice/2006-06/msg00092.php

Regards,

Richard Broersma Jr.

Re: Removing duplicate keys and updating deleted entry key in other table

From

Srinivas Iyyer

Date:

06 September 2006, 13:42:40

hi,
 thanks for pointing out the email.
I have been trying to open the link and seems that
postgresql archive site is down.

would you please send me the content of the link that
you sent earlier. I appreciate your help.

thanks
sri

--- Richard Broersma Jr <rabroersma@yahoo.com> wrote:

> > I have a table that has 19 duplicte entries:
> > table - sequence:
> >  seq_id |  seq_refseq  |    seq_gname
> > --------+--------------+-----------------
> >   33014 | NM_025196    | GRPEL1
> >   33015 | NM_007186    | CEP2
> >   33016 | NM_018148    | LINS1
> >   33017 | NM_199166    | ALAS1
> > .....
> > I suspected there are some duplicate entries:
> > >select seq_refseq,count(*) as score from sequence
> > group by seq_refseq having count(*) > 1;
> >   seq_refseq  | score
> > --------------+-------
> >  NM_033421    |     2
> >  NM_018290    |     2
> >  .....................
> >  NM_005311    |     2
> > (19 rows)
> > > select * from sequence where
> >   seq_refseq ='NM_033421';
> >  seq_id | seq_refseq | seq_gname
> > --------+------------+-----------
> >   43535 | NM_033421  | C20orf161
> >   43554 | NM_033421  | C20orf161
> > (2 rows)
> > Now that 19 records are duplicated, I want to
> delete
> > the duplicated records from other table comb:
> > Table comb:
> > >select * from comb;
> >   cid   |  gid  | seq_id
> > --------+-------+--------
> >   85830 |  5116 |  33014
> >   85831 | 22191 |  33014
> >   85832 | 22186 |  33014
> >  .......................
> > for some i checked to see if any records holds the
> > duplicated key. i found none for 3 or 4 cases.
> > select * from comb where comb.seq_id = 43539;
> >  cid | gid | seq_id
> > -----+-----+--------
> > (0 rows)
> > my question is how do I delete the duplicate row
> and
> > make sure I update the comb table after I delete
> the
> > duplicate key.
> > I never did this before and is a complex problem
> for
> > me to code.
>
> I tried to come up with the sql on my own, but then
> I remembered a similar email that already had
> a good answer.
>
http://archives.postgresql.org/pgsql-novice/2006-06/msg00092.php
>
>
> Regards,
>
> Richard Broersma Jr.
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

archives is not working was (Re: Removing duplicate keys)

From

Richard Broersma Jr

Date:

06 September 2006, 13:48:00

> I have been trying to open the link and seems that
> postgresql archive site is down.
> http://archives.postgresql.org/pgsql-novice/2006-06/msg00092.php

Sorry, the link didn't work.  It worked yesterday when I found it.

I will also forward this to the general list maybe someone there could take a look at it.

Regards,

Richard Broersma Jr.

Re: archives is not working was (Re: Removing duplicate keys)

From

Ray Stell

Date:

07 September 2006, 09:33:04

fixed now:

    * From: <kynn ( at ) panix ( dot ) com>
    * To: pgsql-novice ( at ) postgresql ( dot ) org
    * Subject: SQL for removing duplicates?
    * Date: Tue, 13 Jun 2006 12:11:41 -0400 (EDT)

Hi.  I'm stumped.  I have a large table (about 8.5M records), let's
call it t, whose columns include x and y.  I want to remove records
from this table so that any pair of values for these two fields appear
only once.  (This will get rid of about 15% of the records in t.)

One simple solution would be something like

  CREATE TABLE tmp AS SELECT DISTINCT ON ( x, y ) * FROM t;
  DROP TABLE t;
  ALTER TABLE tmp RENAME TO t;

This works, but it uses a lot of space.  I would prefer to simply cull
the unwanted records from t, but I just can't figure out the SQL for
it.  Any help with it would be *much* appreciated.

Thanks!

On Wed, Sep 06, 2006 at 08:00:54AM -0700, Richard Broersma Jr wrote:
> > I have been trying to open the link and seems that
> > postgresql archive site is down.
> > http://archives.postgresql.org/pgsql-novice/2006-06/msg00092.php
>
> Sorry, the link didn't work.  It worked yesterday when I found it.
>
> I will also forward this to the general list maybe someone there could take a look at it.
>
> Regards,
>
> Richard Broersma Jr.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--