Thread: Primary key duplicates

Primary key duplicates

From
Zeki Mokhtarzada
Date:
I have a very strange bug with postgres 7.4.3.  I have a table with about
15 million rows and recently, duplicate rows have started appearing.

For simplicity, let's say my table is as follows:

files
-------
fileid   integer   -- Primary key generated by a sequence
userid   integer   -- id of user that owns the file
filename character varying(255) -- name of file

Indexes:  "files_pkey" primary key, bree (fileid)
          "files_userid" hash (userid)


When I do:

 select fileid, userid from files where userid = 1898598 order by fileid;

I get:

 fileid  | userid
---------+---------
 3787433 | 1898598
 3787433 | 1898598
 3787563 | 1898598
 9554275 | 1898598


Notice that 3787433 is duplicated.  How could this have happened if that
column is flagged as the primary key.  Even more interesting:

select oid, fileid, userid from files where userid = 1898598 order by
fileid;


   oid    | fileid  | userid
----------+---------+---------
  1573737 | 3787433 | 1898598
  1573737 | 3787433 | 1898598
  1573820 | 3787563 | 1898598
 18612041 | 9554275 | 1898598

The rows have the same OID!  So my question is how do I delete the
duplicate row.  If I execute

 select fileid, userid from files where fileid = 1573737;

I get:

 fileid  | userid
---------+---------
 1573737 | 1008628


Similarly, if I try to delete both of the rows, only one of them gets
deleted, then when I select by userid, I get the other remaining one
listed.  But if I select by fileid I get no rows returned.

I suspect a corrupt index is at fault here.  If that's the case, a reindex
will take quite some time and will lock the table causing a long period of
downtime.  Is that my only option?  Any other ideas?

-Zeki

Re: Primary key duplicates

From
Tom Lane
Date:
Zeki Mokhtarzada <zeki@freewebz.com> writes:
> Notice that 3787433 is duplicated.  How could this have happened if that
> column is flagged as the primary key.  Even more interesting:

> select oid, fileid, userid from files where userid = 1898598 order by
> fileid;
>    oid    | fileid  | userid
> ----------+---------+---------
>   1573737 | 3787433 | 1898598
>   1573737 | 3787433 | 1898598
>   1573820 | 3787563 | 1898598
>  18612041 | 9554275 | 1898598

It would be interesting to see the ctid, xmin, cmin, and xmax fields
as well.

> I suspect a corrupt index is at fault here.

Possibly, but the duplicate OID entries suggest that a row was updated
and then for some reason the old version appears to still be valid.
(Or an attempted update was rolled back, but the new version got marked
valid anyway.)  Have you had any crashes or other strange behavior lately?

            regards, tom lane

Re: Primary key duplicates

From
Tom Lane
Date:
Zeki Mokhtarzada <zeki@freewebz.com> writes:
> The system is running on a Dell PowerEdge 2650 running RedHat 8.  We had a
> kernel halt about two weeks ago that was caused by one of our disk mirrors
> failing.  It could be that these problems were caused at that point in
> time and are just being noticed now.

Not sure.  xmin=2 (FrozenXid) implies that those rows have been like
that for a good long time (~ 1 billion transactions).

> But you never stated how to delete the duplicate rows.  Any suggestions?

You can select 'em by ctid, viz WHERE ctid = '(53101,30)'

            regards, tom lane

Re: Primary key duplicates

From
Tom Lane
Date:
Zeki Mokhtarzada <zeki@freewebz.com> writes:
> It appears that I can select the rows, but not delete or update them.

That's *very* odd, because it works for me.  Is the same true of both
those duplicate rows, or did you only try the one with nonzero xmax?

I think there must be something we don't know about the states of those
rows.  Could you grab pg_filedump from http://sources.redhat.com/rhdb/
(note: right at the moment that server seems to be down, but I suppose
it'll be up soon) and dump out page 53101 of this table?  Usually
I find "-i -f" to be the most informative formatting option.

            regards, tom lane

Re: Primary key duplicates

From
Zeki Mokhtarzada
Date:
On Fri, 13 Aug 2004, Tom Lane wrote:

> It would be interesting to see the ctid, xmin, cmin, and xmax fields
> as well.

     ctid     |    xmin    |    cmin    |    xmax    |   oid    | fileid  | userid
--------------+------------+------------+------------+----------+---------+---------
   (53101,30) |          2 | 1478674026 | 1478674026 |  1573737 | 3787433 | 1898598
   (53101,39) |          2 |          0 |          0 |  1573737 | 3787433 | 1898598

> Possibly, but the duplicate OID entries suggest that a row was updated
> and then for some reason the old version appears to still be valid.
> (Or an attempted update was rolled back, but the new version got marked

I've run the query with set enable_indexscan =off so that rules out an
index problem.


> valid anyway.)  Have you had any crashes or other strange behavior lately?

The system is running on a Dell PowerEdge 2650 running RedHat 8.  We had a
kernel halt about two weeks ago that was caused by one of our disk mirrors
failing.  It could be that these problems were caused at that point in
time and are just being noticed now.

It looks like you've seen this problem before:

http://archives.postgresql.org/pgsql-bugs/2003-12/msg00174.php

But you never stated how to delete the duplicate rows.  Any suggestions?
Also, where can I find documentation on the purpose and values of the
ctid, oid, xmin, xmax, cmin, cmax columns?

Thanks!

-Zeki

Re: Primary key duplicates

From
Zeki Mokhtarzada
Date:
It appears that I can select the rows, but not delete or update them.

# select ctid from files where ctid = '(53101,30)';
    ctid
------------
 (53101,30)
(1 row)

# delete from files where ctid = '(53101,30)';
DELETE 0
# update files set fileid = 1000 where ctid = '(53101,30)';
UPDATE 0
#

-Zeki

On Fri, 13 Aug 2004, Tom Lane wrote:

> Zeki Mokhtarzada <zeki@freewebz.com> writes:
> > The system is running on a Dell PowerEdge 2650 running RedHat 8.  We had a
> > kernel halt about two weeks ago that was caused by one of our disk mirrors
> > failing.  It could be that these problems were caused at that point in
> > time and are just being noticed now.
>
> Not sure.  xmin=2 (FrozenXid) implies that those rows have been like
> that for a good long time (~ 1 billion transactions).
>
> > But you never stated how to delete the duplicate rows.  Any suggestions?
>
> You can select 'em by ctid, viz WHERE ctid = '(53101,30)'
>
>             regards, tom lane
>