Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Recovery inconsistencies, standby much larger than primary
Date
Msg-id CAM-w4HM0N1qd3CVusYXogYQxVV43WO4qAy+J0fjjCZ5SmutX0A@mail.gmail.com
Whole thread Raw
In response to Re: Recovery inconsistencies, standby much larger than primary  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Recovery inconsistencies, standby much larger than primary  (Andres Freund <andres@2ndquadrant.com>)
Re: Recovery inconsistencies, standby much larger than primary  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On Thu, Feb 6, 2014 at 10:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I had noticed that the WAL records that were mis-replayed seemed to
> be bunched pretty close together (two of them even adjacent).  Could
> you confirm that?  If so, it seems like we're looking for some condition
> that makes mis-replay fairly probable for a period of time, but in
> itself might be quite improbable.  Not that that helps much at
> nailing it down.

Well one thing that argues for is hardware problems. It could be that
the right memory mapping just happened to line up with that variable
on the stack for that short time and then it was mapped to something
else entirely. Or the machine was overheating for a short time and
then the temperature became more reasonable. Or the person with the
x-ray source walked by in that short time window.

That doesn't explain the other instance or the other copies of this
database. I think the most productive thing I can do is switch my
attention to the other database to see if it really looks like the
same problem.

> You might well be on to something with the bgwriter idea, considering
> that none of the WAL replay code was originally written with any
> concurrent execution in mind.  We might've missed some place where
> additional locking is needed.

Except that the bgwriter has been in there for a few years already.
Unless there's been some other change, possibly involving copying some
code that was safe in some context but not where it was copied to.

The problem with the bgwriter being at fault is that from what I can
see the bgwriter will never extend a file. That means the xlog
recovery code must have done it. That means even if the bgwriter came
along and looked at the buffer we just read in it would already be too
late to cause mischief. The xlog code extends the file *first* then
reads in the backup block into a buffer. I can't see how it could
corrupt the stack or the wal recovery buffer in the window between
reading in the wal buffer and deciding to extend the relation.


-- 
greg



pgsql-hackers by date:

Previous
From: Jeremy Harris
Date:
Subject: Re: Minor performance improvement in transition to external sort
Next
From: Andres Freund
Date:
Subject: Re: Recovery inconsistencies, standby much larger than primary