Re: Duplicate data despite unique constraint - Mailing list pgsql-general

From Tom Lane
Subject Re: Duplicate data despite unique constraint
Date
Msg-id 23007.1473604324@sss.pgh.pa.us
Whole thread Raw
In response to Re: Duplicate data despite unique constraint  (Daniel Caldeweyher <dcalde@gmail.com>)
List pgsql-general
Daniel Caldeweyher <dcalde@gmail.com> writes:
> I had a similar issue once and was able to recover from it. If this affects
> only some rows and you are able to identify them, this is fixable:

> --force table scan to skip using corrupt index
> set enable_seqscan=1
> set enable_indexscan=0
> set enable_bitmapscan=0
> select email,count(*)
> from users
> group by email
> having count(*) > 1;

> Then, if the rows are simply just duplicates and have no other changes, add
> a new serial column (or to prevent blocking, add a bigint column and update
> with sequential values), then using the emails from above, delete the ones
> with the higher/lower sequence number. Ensure you are still skipping
> indexes.

FWIW, that's kind of the hard way; you can just use the "ctid" system
column as a guaranteed-unique row identifier, instead of having to add
and remove a column.  The latter is very expensive, and if there is any
actual data corruption in the heap it could make things worse.

select ctid, * from users where email = 'duplicated value';
-- choose which row(s) to nuke, then
delete from users where ctid = 'chosen value';

You definitely want to reindex after the data cleanup, since presumably
it's corruption of a unique index that got you into this mess in the
first place.  But as long as it's only the index and not the table that's
damaged, recovery is pretty straightforward.

            regards, tom lane


pgsql-general by date:

Previous
From: Christian Convey
Date:
Subject: Re: C++ port of Postgres
Next
From: Patrick B
Date:
Subject: large number dead tup - Postgres 9.5