Thread: redo failed in physical streaming replication while stopping the master server

Hi all,

Issue:
I use hot standby stream replication in PostgreSQL 9.2.X.
And after i shut down the master server in fast stop mode, i compared the
xlog dump files between the master and slave and found that the shutdown
checkpoint was not replicated to the slave。
Then i check the pg_log in slave and found that redo process failed in
"record with zero length at %X/%X" during master shutdown
startup process terminated current walreceiver and try to reconnect the
master but failed 'cause master is in shutting down mode.
Theoretically, all the wal records should be replicated to the slave when
the master shutdown in normal mode.
I read the source code and found that when we read a record to replay, we
use the last EndRecPtr to get an exact xlog page containing the next record.
If the EndRecPtr points to the end of the last page and the freespace of
that page is less than SizeOfXLogRecord, we align it to the next page.
I notice that there is an annotation below that code:
[1]    /*     * RecPtr is pointing to end+1 of the previous WAL record.  We must     * advance it if necessary to where
thenext record starts.  First,     * align to next page if no more records can fit on the current page.     */    if
(XLOG_BLCKSZ- (RecPtr->xrecoff % XLOG_BLCKSZ) < SizeOfXLogRecord)        NextLogPage(*RecPtr); 
    /* Check for crossing of xlog logid boundary */    if (RecPtr->xrecoff >= XLogFileSize)    {
(RecPtr->xlogid)++;       RecPtr->xrecoff = 0;    }    /*     * If at page start, we must skip over the page header.
Butwe can't     * do that until we've read in the page, since the header size is     * variable.     */ 
The scenario is that:
1. when we do the shutdown checkpoint, we first advance the xlog buffer,
then checkpointguts, then log the checkpoint.
2. for the slave receiver, we received only the xlog page header of the next
page because the shutdown checkpoint record has not been assembled.
3. for the slave recovery, we request the next record using xlogpageread
with an LSN exactly pointing to the next page boundary.
4. for xlog page read, it uses this condition to confirm that receiver has
received some records.
[2]    /* See if we need to retrieve more data */if (readFile < 0 ||    (readSource == XLOG_FROM_STREAM &&
!XLByteLT(*RecPtr,receivedUpto))) 
Here, the RecPtr points to the page boundary[1], receivedUpto points to the
end of page header(2). So it
thinks that receiver has just received some records, so it returns the page
to caller(readrecord).
5. Readrecord check the pageheader ok in this page, and when it try to read
the record, it gets nothing...only a pageheader in the xlog page...

I think the problem is that we try to get an xlog page containing the
"record", and it should be a record, not a page boundary.

Can we use current boudary RecPtr to calculate the true record in the next
page ? Cause we know that next page is a long page header or a short page
header. I don't know the reason why we did not fix this problem in Postgres
9.2, even in 9.6 devel.
Does this can work?               if ((RecPtr->xrecoff % XLogSegSize) == 0)                   XLByteAdvance((*RecPtr),
SizeOfXLogLongPHD)              else                    XLByteAdvance((*RecPtr), SizeOfXLogShortPHD) 

yours,
sincerely

fanbin




--
View this message in context:
http://postgresql.nabble.com/redo-failed-in-physical-streaming-replication-while-stopping-the-master-server-tp5889961.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



/*                 * If at page start, we must skip over the page header.  But
we can't                 * do that until we've read in the page, since the header
size is                 * variable.                 */ 

i don't know the meaning behind this comments,
if ((RecPtr->xrecoff % XLogSegSize) == 0) 
it's a long page header, else a short page header,

so "the header size" can be calculated ? right?



--
View this message in context:
http://postgresql.nabble.com/redo-failed-in-physical-streaming-replication-while-stopping-the-master-server-tp5889961p5890124.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



On Wed, Mar 2, 2016 at 4:25 PM, wcting <wcting163@163.com> wrote:
> /*
>                  * If at page start, we must skip over the page header.  But
> we can't
>                  * do that until we've read in the page, since the header
> size is
>                  * variable.
>                  */
>
> i don't know the meaning behind this comments,
>
>  if ((RecPtr->xrecoff % XLogSegSize) == 0)
> it's a long page header, else a short page header,
>
> so "the header size" can be calculated ? right?

This means that the page must be read first, before recalculating the
record pointer to be the first one after the page header. This is done
a little bit after in ReadRecord():   pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);   targetRecOff =
RecPtr->xrecoff% XLOG_BLCKSZ;   if (targetRecOff == 0)   {       /*        * At page start, so skip over page header.
TheAssert checks that        * we're not scribbling on caller's record pointer; it's OK because we        * can only
gethere in the continuing-from-prev-record case, since        * XRecOffIsValid rejected the zero-page-offset case
otherwise.       */       Assert(RecPtr == &tmpRecPtr);       RecPtr->xrecoff += pageHeaderSize;       targetRecOff =
pageHeaderSize;  }
 
And XLogPageHeaderSize() makes the difference between a long a short header.
-- 
Michael



Thanks for your reply.

If we only take replay for consideration, yeah, we do this header check
until we've read the page first.

But thanks to the master xlog generator, we know that:
when we try advance XLOG insert buffer (page), we treate the new page header
as short header at first.
then we use this condition to make it a long header.
       if ((NewPage->xlp_pageaddr.xrecoff % XLogSegSize) == 0){    XLogLongPageHeader NewLongPage =
(XLogLongPageHeader)NewPage;
 
    NewLongPage->xlp_sysid = ControlFile->system_identifier;    NewLongPage->xlp_seg_size = XLogSegSize;
NewLongPage->xlp_xlog_blcksz= XLOG_BLCKSZ;    NewPage   ->xlp_info |= XLP_LONG_HEADER;
 
    Insert->currpos = ((char *) NewPage) +SizeOfXLogLongPHD;}

So in the replay scenario, before we read the page from wal segment file,
using the specical RecPtr which point to the next page header address, can
we predicat the page header is a long or short?

regards,

fanbin





--
View this message in context:
http://postgresql.nabble.com/redo-failed-in-physical-streaming-replication-while-stopping-the-master-server-tp5889961p5890391.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



On Thu, Mar 3, 2016 at 6:58 PM, lannis <msp548546@163.com> wrote:
> So in the replay scenario, before we read the page from wal segment file,
> using the specical RecPtr which point to the next page header address, can
> we predicat the page header is a long or short?

I am not sure I am getting what you are looking for, but if you mean
if we can predict it or not, the answer is yes. A long header is used
at the beginning of a WAL segment, by default 16MB, and the short
header at the beginning of a WAL page, or XLOG_BLCKSZ, 8kB by default.
-- 
Michael