Re: Violation of primary key constraint - Mailing list pgsql-bugs

From Toby Murray
Subject Re: Violation of primary key constraint
Date
Msg-id CAJeqKgsaS43iQx+s8HVRsGYLes795YzC8HQ8cmkYhb_x3q4btg@mail.gmail.com
Whole thread Raw
In response to Re: Violation of primary key constraint  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Violation of primary key constraint  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Fri, Feb 1, 2013 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Toby Murray <toby.murray@gmail.com> writes:
>> On Thu, Jan 31, 2013 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Could we see the full results of heap_page_items(get_raw_page()) for
>>> each of the pages where any of these tuples live, viz
>
>> Seems awkward to put inline so I made a text file with these results
>> and am attaching it. Hoping the mailing list allows attachments.
>
> Thanks.  These are a bit odd, because none of the tuples in question
> look like they've been updated, except for (11249622,6) which we
> already thought had been re-modified by some later transaction.

Well, that's the one that finally brought the house of cards down when
the update process attempted to modify it but failed because the
database finally noticed the duplicate PK.


> What it looks like to me is that the writes of the source pages simply
> got lost somehow, so that what's on disk now is the pre-update copies
> of these tuples as well as the post-update copies.  I wonder whether
> your SSD dropped a batch of updates.  Speculating wildly: maybe that
> was triggered by a power glitch that also caused the reported error
> on your RAID array?

Yeah... probably no way to confirm/deny this at this point. The
machine is on a UPS but of course nothing is 100%. I think the spinny
drive reporting the SMART errors definitely has a problem. It has
accumulated more errors since the 28th and I will be replacing it
soon.

> One thing that I learned from examining the PK index is that for
> three out of these four pairs, there were intermediate versions,
> ie there must have been multiple UPDATEs issued, probably all in
> the same transaction (does that square with what you know of the
> update process?).

Yes, that is possible. It is actually a big inefficiency in the update
process that I am planning on tinkering with at some point. If 3 nodes
that are part of the way are changed then the way's linestring will be
updated 3 times during the update process. And if the way itself is
changed then it will get updated twice - once for the way attribute
changes and once for the linestring update. Like I said...
inefficient.

The OSM object history does back up your statement of 3 of them having
multiple updates.
26245218: updated once
26709186: updated 2 times
26709284: updated 4 times
26926573: updated 5 times


> It's conceivable that this was a software glitch not a hardware glitch,
> ie Postgres forgetting the dirty-bits for a batch of pages, but we've
> not seen any similar reports elsewhere.  So I'm leaning to the hardware
> explanation.

Roger that. Unless there is anything else to look at with the database
in its current state, I will try to delete the duplicates and then
backtrack my OSM replication process back to the 28th so that any
other errors that may have gotten in during the botched write are
redone.

Thanks for the detailed debugging.

Toby

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: postgres 9.2.2 point conversion from polygon doesn't always give accurate center
Next
From: Tom Lane
Date:
Subject: Re: Violation of primary key constraint