Home > mailing lists

Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Recovery inconsistencies, standby much larger than primary
Date	February 7, 2014 00:48:43
Msg-id	27201.1391723309@sss.pgh.pa.us Whole thread Raw
In response to	Re: Recovery inconsistencies, standby much larger than primary (Greg Stark <stark@mit.edu>)
Responses	Re: Recovery inconsistencies, standby much larger than primary (Greg Stark <stark@mit.edu>)
List	pgsql-hackers

Tree view

Greg Stark <stark@mit.edu> writes:
> Both the primary and the standby were 9.1.11 from the get-go. The
> database the primary was forked off of was 9.1.10 but as far as I can
> tell the primary in the current pair has no problems.

> What's worse is we created a new standby from the same base backup and
> replayed the same records and it didn't reproduce the problem. This
> means either it's a hardware problem -- but we've seen it on multiple
> standbys on this database and at least one other database which is in
> a different data centre -- or it's a race condition --but that's hard
> to credit in the recovery code which is basically single-threaded.

> And these records are from before the standby reaches a consistency so
> it's hard to see how a connection from a hot standby client could
> cause any kind of race condition. The only other thread that could
> conceivably cause a heisenbug is the bgwriter. It's hard to imagine
> how a race condition in there could be so easy to hit that it would
> happen four times on one restore but otherwise go mostly unnoticed.

I had noticed that the WAL records that were mis-replayed seemed to
be bunched pretty close together (two of them even adjacent).  Could
you confirm that?  If so, it seems like we're looking for some condition
that makes mis-replay fairly probable for a period of time, but in
itself might be quite improbable.  Not that that helps much at
nailing it down.

You might well be on to something with the bgwriter idea, considering
that none of the WAL replay code was originally written with any
concurrent execution in mind.  We might've missed some place where
additional locking is needed.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 07 February 2014, 00:35:30
Subject: Re: mvcc catalo gsnapshots and TopTransactionContext

From: Andres Freund
Date: 07 February 2014, 00:53:49
Subject: Re: mvcc catalo gsnapshots and TopTransactionContext

Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

Previous

Next