Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE - Mailing list pgsql-hackers

From Travis Cross
Subject Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE
Date
Msg-id 4485B422.1050604@crosswirecorp.com
Whole thread Raw
In response to Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-hackers
Tom Lane wrote:
> Travis Cross <travis@crosswirecorp.com> writes:
>> I'm noticing that a handful (4-16) of rows with duplicate columns
>> (uid,token) are sneaking into the table every day despite the
>> primary key constraint.
> 
> Corrupt index, looks like ... you might try reindexing the index.

I probably should have mentioned that I have indeed done a REINDEX 
on the table a couple of times in the past, suspecting that issue, 
and having seen it resolve similar issues on this list.  Upon your 
suggestion, I'm running one right now, and I will probably dump and 
reload the entire database after hours, unless anyone thinks that 
would be a bad idea (or unproductive in tracking this down).

> I don't believe that the PANIC you show has anything directly to do
> with duplicate entries.  It is a symptom of corrupt index structure.
> Now a corrupt index might also explain failure to notice duplications,
> but changing your application isn't going to fix whatever is causing
> it.  You need to look for server-side causes.

Indeed, you are correct.  I should also mention that the problem 
seems to build over time, in the sense that everything will run fine 
for awhile (a few days), and then will crash repeatedly.  Deleting 
the duplicate rows seems to reset the counter -- of course, I cannot 
run a successful REINDEX until I have deleted those duplicate rows.

> Any database or system crashes on this server (before this problem
> started)?

No.  In fact, this box, and a sister box running similar hardware, 
have been models of system stability.  My uptimes are 46 and 87 
days, respectively, representing the time since I've done a kernel 
upgrade and the time since I plugged the boxes into the rack.  The 
sister box is running real-time voice services.

> Do you *know* that the disk drive will not lie about write
> complete?

"Know" is such a strong word ;)  Honestly, I have very little idea.  I understand the nature of the problem this
presents,as I've read 
 
the very fine PostgreSQL manual many times over the years.

Because the drives I use are specifically designed to operate well 
in a RAID environment, I would 'hope' that the drives perform honest 
write operations.

I wonder if there is a utility to perform a deterministic test of 
this...

> What is the platform and storage system, anyway?

The platform is:

Linux 2.6.16.9 (w/o loadable modules)
Supermicro PDSMi (a single processor P-D board)
2G ECC DDRII SDRAM

The storage system is:

On-board SATA ICH7R Controller
2 x WD3200SD hard drives running in a Linux RAID 1 configuration.
That is to say: Western Digital 320G SATA 'enterprise' drives.  The 
drives have a somewhat unique feature: time-limited error recovery, 
which is supposed to let the RAID controller/software deal with 
errors after a certain point (7 seconds), rather than continuing to 
block, and causing the drive to fall out of the array.

The drive:
http://www.westerndigital.com/en/products/products.asp?driveid=114&language=en

I'll run file system consistency checks tonight to see if I can pick 
out a proximal cause for all this chaos.

I really do appreciate the assistance.

Cheers,

-- Travis


pgsql-hackers by date:

Previous
From: Roberto Rezende de Assis
Date:
Subject: AGREGATE FUNCTIONS
Next
From: Travis Cross
Date:
Subject: Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE