Thread: remote duplicate rows

remote duplicate rows

From
"Junkone"
Date:
hI
i have a bad situation that i did not have primary key. so i have a
table like this
colname1                colname2
1                                 apple
1                                 apple
2                                  orange
2                                   orange

It is a very large table. how do i remove the duplctes quickly annd
without much change.

Regards

Seede


Re: remote duplicate rows

From
"A. Kretschmer"
Date:
am  Wed, dem 13.09.2006, um 15:46:58 -0700 mailte Junkone folgendes:
> hI
> i have a bad situation that i did not have primary key. so i have a
> table like this
> colname1                colname2
> 1                                 apple
> 1                                 apple
> 2                                  orange
> 2                                   orange
>
> It is a very large table. how do i remove the duplctes quickly annd
> without much change.

begin;
alter table foo rename to tmp;
create table foo as select distinct * from tmp;
commit;

You should create a primary key now to avoid duplicated entries...


HTH, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47215,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: remote duplicate rows

From
John Sidney-Woollett
Date:
If you have a primary key value (or OID?) then you can delete the
duplicates in situ using something like (untested)

-- should work if never more than 1 duplicate row for colname1, colname2

delete from table where pk_value in (
select min(pk_value)
from table
group by colname1, colname2
having count(*) > 1
)

-- if you can have multiple duplicate rows for colname1, colname2
-- then you need something like

delete from table where pk_value not in (
select min(pk_value)
from table
group by colname1, colname2
having count(*) = 1
)

Hope that helps.

John

A. Kretschmer wrote:
> am  Wed, dem 13.09.2006, um 15:46:58 -0700 mailte Junkone folgendes:
>> hI
>> i have a bad situation that i did not have primary key. so i have a
>> table like this
>> colname1                colname2
>> 1                                 apple
>> 1                                 apple
>> 2                                  orange
>> 2                                   orange
>>
>> It is a very large table. how do i remove the duplctes quickly annd
>> without much change.
>
> begin;
> alter table foo rename to tmp;
> create table foo as select distinct * from tmp;
> commit;
>
> You should create a primary key now to avoid duplicated entries...
>
>
> HTH, Andreas

Re: remote duplicate rows

From
"Andrews, Chris"
Date:
Dunno about quickly, but I usually do something like this (before slapping myself in the face for getting into that
state):

CREATE TABLE tn_backup AS SELECT DISTINCT * FROM tn;
TRUNCATE TABLE tn;
INSERT INTO tn VALUES SELECT * from tn_backup;

(Where "tn" is the table name)

May not be the best way, but keeps indexes and stuff on the original table if you don't want to set them all up again.
Melazy?
 

That said, if you've got foriegn keys pointing at it, the truncate ain't going to work.

Or if you have your data exported as a tab or csv, the use sort | uniq on it and
shove it back in...



-----Original Message-----
From: Junkone [mailto:junkone1@gmail.com]
Sent: 13 September 2006 23:47
To: pgsql-general@postgresql.org
Subject: [GENERAL] remote duplicate rows


hI
i have a bad situation that i did not have primary key. so i have a
table like this
colname1                colname2
1                                 apple
1                                 apple
2                                  orange
2                                   orange

It is a very large table. how do i remove the duplctes quickly annd
without much change.

Regards

Seede


-----------------------------------------
The information contained in this email is confidential and is
intended for the recipient only. If you have received it in error,
please notify us immediately by reply email and then delete it from
your system. Please do not copy it or use it for any purposes, or
disclose its contents to any other person or store or copy this
information in any medium. The views contained in this email are
those of the author and not necessarily those of Lorien plc.


 

Thank you for your co-operation.


Re: remote duplicate rows

From
Berend Tober
Date:
A. Kretschmer wrote:

>am  Wed, dem 13.09.2006, um 15:46:58 -0700 mailte Junkone folgendes:
>
>
>>hI
>>i have a bad situation that i did not have primary key. so i have a
>>table like this
>>colname1                colname2
>>1                                 apple
>>1                                 apple
>>2                                  orange
>>2                                   orange
>>
>>It is a very large table. how do i remove the duplctes quickly annd
>>without much change.
>>
>>
>
>begin;
>alter table foo rename to tmp;
>create table foo as select distinct * from tmp;
>commit;
>
>
A couple potential problems here. First, you forgot to drop table tmp. But maybe that is good thing because although
theOP hasn't told us anything else useful about the situation, and he has clearly contrived a simplistic facsimile of
hisreal problem, to be useful the table most likely either has foreign key references, and/or is the primary key for
othertable foreign keys. You're suggestion will break whatever application this data base supports because all the
foreignkeys will point to table tmp rather than foo afterwards. Similarly, there is the problem of any indexes on the
tablethat would be lost. But I suppose one can make the point that your suggestion is a great solution, given the
contrivedexample and insufficient problem understanding presented by the OP -- I really think he needs more help than
herealizes. 

Regards,
Berend Tober



Re: remote duplicate rows

From
"Junkone"
Date:
Thanks for all of your help. I backed up the table and used the PgAdmin
tool to create Insert statements. It did it in two sets. I reran the
first set and it solved the problem.

Seede

"Andrews, Chris" wrote:
> Dunno about quickly, but I usually do something like this (before slapping myself in the face for getting into that
state):
>
> CREATE TABLE tn_backup AS SELECT DISTINCT * FROM tn;
> TRUNCATE TABLE tn;
> INSERT INTO tn VALUES SELECT * from tn_backup;
>
> (Where "tn" is the table name)
>
> May not be the best way, but keeps indexes and stuff on the original table if you don't want to set them all up
again.Me lazy? 
>
> That said, if you've got foriegn keys pointing at it, the truncate ain't going to work.
>
> Or if you have your data exported as a tab or csv, the use sort | uniq on it and
> shove it back in...
>
>
>
> -----Original Message-----
> From: Junkone [mailto:junkone1@gmail.com]
> Sent: 13 September 2006 23:47
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] remote duplicate rows
>
>
> hI
> i have a bad situation that i did not have primary key. so i have a
> table like this
> colname1                colname2
> 1                                 apple
> 1                                 apple
> 2                                  orange
> 2                                   orange
>
> It is a very large table. how do i remove the duplctes quickly annd
> without much change.
>
> Regards
>
> Seede
>
>
> -----------------------------------------
> The information contained in this email is confidential and is
> intended for the recipient only. If you have received it in error,
> please notify us immediately by reply email and then delete it from
> your system. Please do not copy it or use it for any purposes, or
> disclose its contents to any other person or store or copy this
> information in any medium. The views contained in this email are
> those of the author and not necessarily those of Lorien plc.
>
>
>
>
> Thank you for your co-operation.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly


Re: remote duplicate rows

From
Sim Zacks
Date:
You forgot to mention that all the functions/views that utilized that table also now point to the
original table with the changed name, because it doesn't store the table name, it stores the table oid.

Berend Tober wrote:
> A. Kretschmer wrote:
>
>> am  Wed, dem 13.09.2006, um 15:46:58 -0700 mailte Junkone folgendes:
>>
>>
>>> hI
>>> i have a bad situation that i did not have primary key. so i have a
>>> table like this
>>> colname1                colname2
>>> 1                                 apple
>>> 1                                 apple
>>> 2                                  orange
>>> 2                                   orange
>>>
>>> It is a very large table. how do i remove the duplctes quickly annd
>>> without much change.
>>>
>>
>> begin;
>> alter table foo rename to tmp;
>> create table foo as select distinct * from tmp;
>> commit;
>>
>>
> A couple potential problems here. First, you forgot to drop table tmp.
> But maybe that is good thing because although the OP hasn't told us
> anything else useful about the situation, and he has clearly contrived a
> simplistic facsimile of his real problem, to be useful the table most
> likely either has foreign key references, and/or is the primary key for
> other table foreign keys. You're suggestion will break whatever
> application this data base supports because all the foreign keys will
> point to table tmp rather than foo afterwards. Similarly, there is the
> problem of any indexes on the table that would be lost. But I suppose
> one can make the point that your suggestion is a great solution, given
> the contrived example and insufficient problem understanding presented
> by the OP -- I really think he needs more help than he realizes.
>
> Regards,
> Berend Tober
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>