Thread: Duplicate Row Removal

Duplicate Row Removal

From

"Peter Atkins"

Date:

04 November 2005, 23:01:47

All,

I have a duplicate row problem and to make matters worse some tables
don't have a PK or any unique identifier. See below:

uid | cleanval | timestamp  | received
-----+-------+------------+----------
  38 |     5 | 1125081799 |        1
  38 |    14 | 1122683252 |        0
  38 |     5 | 1125081799 |        1
  38 |    14 | 1122683252 |        0


I was hoping to have a system oid for each row but it looks like that's
not the case. Anyone have any thoughts on how to remove dups? I have
about 40 tables of various sizes.

Cheers,
Peter

Re: Duplicate Row Removal

From

Tom Lane

Date:

05 November 2005, 02:24:27

"Peter Atkins" <patkins@directpartners.com> writes:
> I was hoping to have a system oid for each row but it looks like that's
> not the case. Anyone have any thoughts on how to remove dups?

ctid always works ...

            regards, tom lane

Re: Duplicate Row Removal

From

"Dean Gibson (DB Administrator)"

Date:

05 November 2005, 03:03:34

CREATE TABLE new_name AS SELECT DISTINCT * FROM old_name;

DROP TABLE old_name;

ALTER TABLE new_name RENAME TO old_name;

On 2005-11-04 17:15, Peter Atkins wrote:
> All,
>
> I have a duplicate row problem and to make matters worse some tables don't have a PK or any unique identifier.
>
> Anyone have any thoughts on how to remove dups?

Re: Duplicate Row Removal

From

Berend Tober

Date:

05 November 2005, 15:24:49

Dean Gibson (DB Administrator) wrote:

> CREATE TABLE new_name AS SELECT DISTINCT * FROM old_name;
>
> DROP TABLE old_name;
>
> ALTER TABLE new_name RENAME TO old_name;

The problem with this technique is that it doesn't account for indexes,
foreign key references, and other dependencies.

Another approach is to temporarily add an integer column, populate it
with sequential values, and then use that new column to uniquely
identify the rows that are otherwise duplicates. Then you can use
aggregation to identify and delete the rows you don't need, followed by
dropping the temporary extra column. HTH.

-- BMT

>
> On 2005-11-04 17:15, Peter Atkins wrote:
>
>> All,
>>
>> I have a duplicate row problem and to make matters worse some tables
>> don't have a PK or any unique identifier.
>> Anyone have any thoughts on how to remove dups?
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>