Re: 9a57858f1103b89a5674f0d50c5fe1f756411df6 - Mailing list pgsql-hackers

From Greg Stark
Subject Re: 9a57858f1103b89a5674f0d50c5fe1f756411df6
Date
Msg-id CAM-w4HO2CAQ1k34cx3vw3_gJ8eQxUA44kgSh=pCQTpCsj5VnPA@mail.gmail.com
Whole thread Raw
In response to Re: 9a57858f1103b89a5674f0d50c5fe1f756411df6  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
<p dir="ltr"><br /> On 13 Mar 2014 01:36, "Stephen Frost" <<a
href="mailto:sfrost@snowman.net">sfrost@snowman.net</a>>wrote:<br /> ><br /> > * Tom Lane (<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>)wrote:<br /> > > This thread badly needs a more informative
Subjectline.<br /> ><br /> > Agreed.<br /> ><br /> > > But, yeah: do people think the referenced commit
fixesa bug bad enough<br /> > > to deserve a quick update release?  If so, why?  Multiple reports of<br /> >
>problems in the field would be a good reason, but I've not seen such.<br /> ><br /> > Uh, isn't what brought
thisto light two independent complaints from<br /> > Peter and Greg Stark of seeing corruption in the field due to
this?<br/> ><br /> > Peter's initial email also indicated it was two different systems which<br /> > had
gottenbit by this and Greg explicitly stated that he was working on<br /> > an independent database from what Peter
wasreporting on, so that's at<br /> > least 2 (one each), or 3 (if you count databases, as Peter had 2).<br /> >
Sure,they're all from Heroku, but I find it highly unlikely no one else<br /> > has run into this issue.  More
likely,they simply haven't realized it's<br /> > happened to them (which is another reason this is a particularly
nasty<br/> > bug..).<p dir="ltr">We have the two databases where we're sure this was the problem. On the one I
workedon the customer complained that it happened repeatedly. <p dir="ltr">The key I demonstrated here wasn't even the
onethe costumer was complaining about. It seems their usage pattern made it extremely easy to trigger and that usage
patternarose naturally from using a rails module called counter_cache which maintains a cache of the count of a child
takein the parent table.<p dir="ltr">We also have a few other customers complaining about duplicate keys. It's hard to
besure but these may have been standbys where the problem occurred ages ago and they only now activated their standby
andran into the problem.<p dir="ltr">That's what worries me most about this bug. You'll only detect it if you're
routinelyquerying your standby. If you have a standby for HA purposes it might be corrupt for a long time without you
realisingit. We may be fielding corruption complaints for a long time without being able to conclusively prove whether
it'sdue to this bug or not. 

pgsql-hackers by date:

Previous
From: Fabrízio de Royes Mello
Date:
Subject: Is this a bug?
Next
From: Amit Kapila
Date:
Subject: Re: Patch: show relation and tuple infos of a lock to acquire