Re: Urgent: Key constraints behaving weirdly - Mailing list pgsql-bugs

From Russell Garrett
Subject Re: Urgent: Key constraints behaving weirdly
Date
Msg-id MKEGJINFADFODDNOKEJCIEKGENAA.rg@tcslon.com
Whole thread Raw
In response to Re: Urgent: Key constraints behaving weirdly  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Tom Lane wrote:
> "Russell Garrett" <rg@tcslon.com> writes:
>> last=> reindex index profile_pkey;
>> ERROR:  could not create unique index
>> DETAIL:  Table contains duplicated values.
>
> Okay, it looks like the table actually contains duplicate rows but the
> index has managed to lose track of that fact.  You can see the
> duplicates in the GROUP BY query (which isn't using the index) but
> not when you do "select * from profile where id = 1017057", because
> that query will use the index.

Ah, it went so quick I was sure it was using the index :).

> If you did "set enable_indexscan = off" then the "select * from
> profile where id = 1017057" query would probably show you two rows.
> I'd be interested to see the results of
>
> select ctid, oid, xmin, xmax, cmin, cmax from profile where id =
> 1017057;
> with enable_indexscan off.

last=> set enable_indexscan = off;
SET
last=> select ctid, oid, xmin, xmax, cmin, cmax from profile where id =
1017057;
   ctid    |   oid    |   xmin   |   xmax   |   cmin   | cmax
-----------+----------+----------+----------+----------+------
  (482,27) | 65196187 | 21095783 | 21953819 | 21953819 |    0
 (3095,56) | 64140531 | 20037571 | 20037571 |      545 |  555
(2 rows)

>> Does REINDEX use the current index to check for duplicates? :)
>
> No, it's building a new index from scratch, and so it notices the
> duplicates.

Yeah, I see now.

> What you've got here is definitely a case of index corruption that has
> led to logical corruption of the table (ie, duplicate rows).  To get
> back to a valid state you will need to delete whichever of the
> duplicates seems to be out-of-date, and then do a REINDEX.  But I
> think it is important first to try to determine what caused the
> corruption (software error or hardware?).  Again, if you can take a
> physical dump of the data directory, that would provide an
> opportunity to study the problem later after you get the production
> machine back on its feet.

OK, I've kept a copy of the db and then fixed the problem.

We had an incidence of table corruption a few weeks back, but we just
ignored it (wishful thinking, maybe). Postgres is compiled normally, with
gcc3. OS is Red Hat 9.

Hardware is a Dell Poweredge 2.4Ghz Xeon (can't remember the model number),
DB is using XFS on a MegaRaid U320 controller, running the MegaRaid 1.18j
drivers. Naturally, the drivers don't report any errors. I doubt it's memory
corruption as the system has been rock-solid. So I'm guessing it's probably
MegaRaid, or XFS, or the kernel I suppose (although I'm using the latest).

Problem is, it's quite hard to reproduce the corruption, since it seems to
be fairly rare.

Maybe this should be moved to pgsql-general now?

Russ

pgsql-bugs by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: BUG #1019: src/pl/tcl/pltcl.c
Next
From: Tom Lane
Date:
Subject: Re: [GENERAL] CASE in where statement. BUG ??