Thread: redo failed in physical streaming replication while stopping the master server
Hi all, Issue: I use hot standby stream replication in PostgreSQL 9.2.X. And after i shut down the master server in fast stop mode, i compared the xlog dump files between the master and slave and found that the shutdown checkpoint was not replicated to the slave。 Then i check the pg_log in slave and found that redo process failed in "record with zero length at %X/%X" during master shutdown startup process terminated current walreceiver and try to reconnect the master but failed 'cause master is in shutting down mode. Theoretically, all the wal records should be replicated to the slave when the master shutdown in normal mode. I read the source code and found that when we read a record to replay, we use the last EndRecPtr to get an exact xlog page containing the next record. If the EndRecPtr points to the end of the last page and the freespace of that page is less than SizeOfXLogRecord, we align it to the next page. I notice that there is an annotation below that code: [1] /* * RecPtr is pointing to end+1 of the previous WAL record. We must * advance it if necessary to where thenext record starts. First, * align to next page if no more records can fit on the current page. */ if (XLOG_BLCKSZ- (RecPtr->xrecoff % XLOG_BLCKSZ) < SizeOfXLogRecord) NextLogPage(*RecPtr); /* Check for crossing of xlog logid boundary */ if (RecPtr->xrecoff >= XLogFileSize) { (RecPtr->xlogid)++; RecPtr->xrecoff = 0; } /* * If at page start, we must skip over the page header. Butwe can't * do that until we've read in the page, since the header size is * variable. */ The scenario is that: 1. when we do the shutdown checkpoint, we first advance the xlog buffer, then checkpointguts, then log the checkpoint. 2. for the slave receiver, we received only the xlog page header of the next page because the shutdown checkpoint record has not been assembled. 3. for the slave recovery, we request the next record using xlogpageread with an LSN exactly pointing to the next page boundary. 4. for xlog page read, it uses this condition to confirm that receiver has received some records. [2] /* See if we need to retrieve more data */if (readFile < 0 || (readSource == XLOG_FROM_STREAM && !XLByteLT(*RecPtr,receivedUpto))) Here, the RecPtr points to the page boundary[1], receivedUpto points to the end of page header(2). So it thinks that receiver has just received some records, so it returns the page to caller(readrecord). 5. Readrecord check the pageheader ok in this page, and when it try to read the record, it gets nothing...only a pageheader in the xlog page... I think the problem is that we try to get an xlog page containing the "record", and it should be a record, not a page boundary. Can we use current boudary RecPtr to calculate the true record in the next page ? Cause we know that next page is a long page header or a short page header. I don't know the reason why we did not fix this problem in Postgres 9.2, even in 9.6 devel. Does this can work? if ((RecPtr->xrecoff % XLogSegSize) == 0) XLByteAdvance((*RecPtr), SizeOfXLogLongPHD) else XLByteAdvance((*RecPtr), SizeOfXLogShortPHD) yours, sincerely fanbin -- View this message in context: http://postgresql.nabble.com/redo-failed-in-physical-streaming-replication-while-stopping-the-master-server-tp5889961.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: redo failed in physical streaming replication while stopping the master server
From
wcting
Date:
/* * If at page start, we must skip over the page header. But we can't * do that until we've read in the page, since the header size is * variable. */ i don't know the meaning behind this comments, if ((RecPtr->xrecoff % XLogSegSize) == 0) it's a long page header, else a short page header, so "the header size" can be calculated ? right? -- View this message in context: http://postgresql.nabble.com/redo-failed-in-physical-streaming-replication-while-stopping-the-master-server-tp5889961p5890124.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: Re: redo failed in physical streaming replication while stopping the master server
From
Michael Paquier
Date:
On Wed, Mar 2, 2016 at 4:25 PM, wcting <wcting163@163.com> wrote: > /* > * If at page start, we must skip over the page header. But > we can't > * do that until we've read in the page, since the header > size is > * variable. > */ > > i don't know the meaning behind this comments, > > if ((RecPtr->xrecoff % XLogSegSize) == 0) > it's a long page header, else a short page header, > > so "the header size" can be calculated ? right? This means that the page must be read first, before recalculating the record pointer to be the first one after the page header. This is done a little bit after in ReadRecord(): pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf); targetRecOff = RecPtr->xrecoff% XLOG_BLCKSZ; if (targetRecOff == 0) { /* * At page start, so skip over page header. TheAssert checks that * we're not scribbling on caller's record pointer; it's OK because we * can only gethere in the continuing-from-prev-record case, since * XRecOffIsValid rejected the zero-page-offset case otherwise. */ Assert(RecPtr == &tmpRecPtr); RecPtr->xrecoff += pageHeaderSize; targetRecOff = pageHeaderSize; } And XLogPageHeaderSize() makes the difference between a long a short header. -- Michael
Re: redo failed in physical streaming replication while stopping the master server
From
lannis
Date:
Thanks for your reply. If we only take replay for consideration, yeah, we do this header check until we've read the page first. But thanks to the master xlog generator, we know that: when we try advance XLOG insert buffer (page), we treate the new page header as short header at first. then we use this condition to make it a long header. if ((NewPage->xlp_pageaddr.xrecoff % XLogSegSize) == 0){ XLogLongPageHeader NewLongPage = (XLogLongPageHeader)NewPage; NewLongPage->xlp_sysid = ControlFile->system_identifier; NewLongPage->xlp_seg_size = XLogSegSize; NewLongPage->xlp_xlog_blcksz= XLOG_BLCKSZ; NewPage ->xlp_info |= XLP_LONG_HEADER; Insert->currpos = ((char *) NewPage) +SizeOfXLogLongPHD;} So in the replay scenario, before we read the page from wal segment file, using the specical RecPtr which point to the next page header address, can we predicat the page header is a long or short? regards, fanbin -- View this message in context: http://postgresql.nabble.com/redo-failed-in-physical-streaming-replication-while-stopping-the-master-server-tp5889961p5890391.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: Re: redo failed in physical streaming replication while stopping the master server
From
Michael Paquier
Date:
On Thu, Mar 3, 2016 at 6:58 PM, lannis <msp548546@163.com> wrote: > So in the replay scenario, before we read the page from wal segment file, > using the specical RecPtr which point to the next page header address, can > we predicat the page header is a long or short? I am not sure I am getting what you are looking for, but if you mean if we can predict it or not, the answer is yes. A long header is used at the beginning of a WAL segment, by default 16MB, and the short header at the beginning of a WAL page, or XLOG_BLCKSZ, 8kB by default. -- Michael