Re: BUG #8686: Standby could not restart. - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #8686: Standby could not restart.
Date
Msg-id 52CABFD8.8090101@vmware.com
Whole thread Raw
In response to Re: BUG #8686: Standby could not restart.  (Tomonari Katsumata <t.katsumata1122@gmail.com>)
Responses Re: BUG #8686: Standby could not restart.
List pgsql-bugs
On 12/23/2013 08:15 AM, Tomonari Katsumata wrote:
>>                   /*
>>>                   * Initialize shared replayEndRecPtr,
>>> lastReplayedEndRecPtr, and
>>>                   * recoveryLastXTime.
>>>                   *
>>>                   * This is slightly confusing if we're starting from an
>>> online
>>>                   * checkpoint; we've just read and replayed the
>>> checkpoint record, but
>>>                   * we're going to start replay from its redo pointer,
>>> which precedes
>>>                   * the location of the checkpoint record itself. So even
>>> though the
>>>                   * last record we've replayed is indeed ReadRecPtr, we
>>> haven't
>>>                   * replayed all the preceding records yet. That's OK for
>>> the current
>>>                   * use of these variables.
>>>                   */
>>>                  SpinLockAcquire(&xlogctl->info_lck);
>>>                  xlogctl->replayEndRecPtr = ReadRecPtr;
>>>                  xlogctl->lastReplayedEndRecPtr = EndRecPtr;
>>>                  xlogctl->recoveryLastXTime = 0;
>>>                  xlogctl->currentChunkStartTime = 0;
>>>                  xlogctl->recoveryPause = false;
>>>                  SpinLockRelease(&xlogctl->info_lck);
>>>
>>
>> I think we need to fix that confusion. Your patch will do it by not
>> setting EndRecPtr yet; that fixes the bug, but leaves those variables in a
>> slightly strange state; I'm not sure what EndRecPtr points to in that case
>> (0 ?), but ReadRecPtr would be set I guess.
>>
> Yes, the values were set like below.
> ReadRecPtr:1/8E7F0B0
> EndRecPtr:0/0
>
>
>
>>
>> Perhaps we should reset replayEndRecPtr and lastReplayedEndRecPtr to the
>> REDO point here, instead of ReadRecPtr/EndRecPtr.
>
>   I made another patch.
> I added a ReadRecord to make sure the REDO location is present or not.
> The similar process are done when we use backup_label.
>
> Because the ReadRecord returns a record already read,
> I set ReadRecPtr of the record to EndRecPtr.
> And also I set record->xl_prev to ReadRecPtr.
> As you said, it also worked fine.
>
> I'm not sure we should do same thing when crash recovery occurs, but now I
> added the process when archive recovery is needed.
>
> Please see attached patch.

Hmm. That would still initialize lastReplayedEndRecPtr to the checkpoint
record, when you do crash recovery (or begin archive recovery from a
filesystem snapshot, where you perform crash recovery before starting to
read the archive). I'm not very comfortable with that, even though I
don't see an immediate problem with it.

I also noticed that CheckRecoveryConsistency() compares backupEndPoint
with EndRecPtr, but minRecoveryPoint with lastReplayedEndRecPtr. That's
correct as the code stands, but it's an accident waiting to happen: if
you called CheckRecoveryConsistency() after reading a record with
ReadRecord(), but before fully replaying it, it might conclude that it's
reached the backup end location one record too early. And it's
inconsistent, anyway.

I propose the attached patch. I haven't tested it, but I think it's a
slightly more robust fix.

- Heikki

Attachment

pgsql-bugs by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Next
From: Andres Freund
Date:
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages