I have a very strange bug with postgres 7.4.3. I have a table with about
15 million rows and recently, duplicate rows have started appearing.
For simplicity, let's say my table is as follows:
files
-------
fileid integer -- Primary key generated by a sequence
userid integer -- id of user that owns the file
filename character varying(255) -- name of file
Indexes: "files_pkey" primary key, bree (fileid)
"files_userid" hash (userid)
When I do:
select fileid, userid from files where userid = 1898598 order by fileid;
I get:
fileid | userid
---------+---------
3787433 | 1898598
3787433 | 1898598
3787563 | 1898598
9554275 | 1898598
Notice that 3787433 is duplicated. How could this have happened if that
column is flagged as the primary key. Even more interesting:
select oid, fileid, userid from files where userid = 1898598 order by
fileid;
oid | fileid | userid
----------+---------+---------
1573737 | 3787433 | 1898598
1573737 | 3787433 | 1898598
1573820 | 3787563 | 1898598
18612041 | 9554275 | 1898598
The rows have the same OID! So my question is how do I delete the
duplicate row. If I execute
select fileid, userid from files where fileid = 1573737;
I get:
fileid | userid
---------+---------
1573737 | 1008628
Similarly, if I try to delete both of the rows, only one of them gets
deleted, then when I select by userid, I get the other remaining one
listed. But if I select by fileid I get no rows returned.
I suspect a corrupt index is at fault here. If that's the case, a reindex
will take quite some time and will lock the table causing a long period of
downtime. Is that my only option? Any other ideas?
-Zeki