Re: BUG #8686: Standby could not restart. - Mailing list pgsql-bugs

From Tomonari Katsumata
Subject Re: BUG #8686: Standby could not restart.
Date
Msg-id CAC55fYfGp3zfp1ySJc_QCJycQt4iJ0ch2S77S4gWQLj-x_pp1g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #8686: Standby could not restart.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: BUG #8686: Standby could not restart.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-bugs
Hi Heikki,
Thanks for your confirmation and comments.


                /*
                 * Initialize shared replayEndRecPtr, lastReplayedEndRecPtr, and
                 * recoveryLastXTime.
                 *
                 * This is slightly confusing if we're starting from an online
                 * checkpoint; we've just read and replayed the checkpoint record, but
                 * we're going to start replay from its redo pointer, which precedes
                 * the location of the checkpoint record itself. So even though the
                 * last record we've replayed is indeed ReadRecPtr, we haven't
                 * replayed all the preceding records yet. That's OK for the current
                 * use of these variables.
                 */
                SpinLockAcquire(&xlogctl->info_lck);
                xlogctl->replayEndRecPtr = ReadRecPtr;
                xlogctl->lastReplayedEndRecPtr = EndRecPtr;
                xlogctl->recoveryLastXTime = 0;
                xlogctl->currentChunkStartTime = 0;
                xlogctl->recoveryPause = false;
                SpinLockRelease(&xlogctl->info_lck);

I think we need to fix that confusion. Your patch will do it by not setting EndRecPtr yet; that fixes the bug, but leaves those variables in a slightly strange state; I'm not sure what EndRecPtr points to in that case (0 ?), but ReadRecPtr would be set I guess.
Yes, the values were set like below.
ReadRecPtr:1/8E7F0B0
EndRecPtr:0/0
 
 


Perhaps we should reset replayEndRecPtr and lastReplayedEndRecPtr to the REDO point here, instead of ReadRecPtr/EndRecPtr.

 
 I made another patch.
I added a ReadRecord to make sure the REDO location is present or not.
The similar process are done when we use backup_label.
 
Because the ReadRecord returns a record already read,
I set ReadRecPtr of the record to EndRecPtr.
And also I set record->xl_prev to ReadRecPtr.
As you said, it also worked fine.
 
I'm not sure we should do same thing when crash recovery occurs, but now I added the process when archive recovery is needed.
 
Please see attached patch.
 
regards,
---------------------
Tomonari Katsumata
Attachment

pgsql-bugs by date:

Previous
From: Vik Fearing
Date:
Subject: Re: BUG #8696: Type-checking seems to fail on UNIONs with arrays
Next
From: Michael Paquier
Date:
Subject: Re: BUG #8689: createdb db finds user ? password?