Thread: postmaster startup failure
WHen running PostgreSQL 7.3.3-1 (from rpm's) on Redhat 9.0 I got the following in logs and the postmaster will not start up. Any Ideas what I could do to start up ? This in on a laptop used for development, but I still would like to not initdb. postmaster successfully started LOG: database system shutdown was interrupted at 2003-07-17 00:42:29 EEST LOG: checkpoint record is at 0/304E76A8 LOG: redo record is at 0/304E76A8; undo record is at 0/0; shutdown FALSE LOG: next transaction id: 3981836; next oid: 4003572 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/304E76E8 LOG: ReadRecord: unexpected pageaddr 0/2C504000 in log file 0, segment 48, offset 5259264 LOG: redo done at 0/30503FDC PANIC: XLogWrite: write request 0/30504000 is past end of log 0/30504000 LOG: startup process (pid 2445) was terminated by signal 6 LOG: aborting startup due to startup process failure -------------' Hannu
Hannu Krosing <hannu@tm.ee> writes: > WHen running PostgreSQL 7.3.3-1 (from rpm's) on Redhat 9.0 I got the > following in logs and the postmaster will not start up. > PANIC: XLogWrite: write request 0/30504000 is past end of log > 0/30504000 This looks like you've stumbled across some sort of boundary-condition bug in xlog.c. It must be a pretty low-probability situation since we've not seen it before. How large is the database in question --- would it be feasible to send it to me (as a tar of the whole $PGDATA directory) for examination? regards, tom lane
Hannu Krosing <hannu@tm.ee> writes: > WHen running PostgreSQL 7.3.3-1 (from rpm's) on Redhat 9.0 I got the > following in logs and the postmaster will not start up. > PANIC: XLogWrite: write request 0/30504000 is past end of log > 0/30504000 Ugh. The reason we hadn't seen this happen in the field was that it is a bug I introduced in a patch two months ago :-( 7.3.3 will in fact fail to start up, with the above error, any time the last record of the WAL file ends exactly at a page boundary. I think we're gonna need a quick 7.3.4 ... If you want a source patch for 7.3.3, here it is. regards, tom lane *** src/backend/access/transam/xlog.c.orig Thu May 22 10:39:49 2003 --- src/backend/access/transam/xlog.c Thu Jul 17 12:36:20 2003 *************** *** 2483,2488 **** --- 2483,2489 ---- EndOfLog; XLogRecord *record; char *buffer; + uint32 freespace; /* Use malloc() to ensure record buffer is MAXALIGNED */ buffer = (char *) malloc(_INTL_MAXLOGRECSZ); *************** *** 2678,2685 **** memcpy((char *) Insert->currpage, readBuf, BLCKSZ); Insert->currpos = (char *) Insert->currpage+ (EndOfLog.xrecoff + BLCKSZ - XLogCtl->xlblocks[0].xrecoff); - /* Make sure rest of page is zero */ - MemSet(Insert->currpos, 0, INSERT_FREESPACE(Insert)); LogwrtResult.Write = LogwrtResult.Flush = EndOfLog; --- 2679,2684 ---- *************** *** 2689,2694 **** --- 2688,2714 ---- XLogCtl->LogwrtRqst.Write = EndOfLog; XLogCtl->LogwrtRqst.Flush = EndOfLog; + + freespace = INSERT_FREESPACE(Insert); + if (freespace > 0) + { + /* Make sure rest of page is zero */ + MemSet(Insert->currpos, 0, freespace); + XLogCtl->Write.curridx = 0; + } + else + { + /* + * Whenever Write.LogwrtResult points to exactly the end of a page, + * Write.curridx must point to the *next* page (see XLogWrite()). + * + * Note: it might seem we should do AdvanceXLInsertBuffer() here, + * but we can't since we haven't yet determined the correct StartUpID + * to put into the new page's header. The first actual attempt to + * insert a log record will advance the insert state. + */ + XLogCtl->Write.curridx = NextBufIdx(0); + } #ifdef NOT_USED /* UNDO */
Tom Lane kirjutas N, 17.07.2003 kell 19:49: > Ugh. The reason we hadn't seen this happen in the field was that it is > a bug I introduced in a patch two months ago :-( > > 7.3.3 will in fact fail to start up, with the above error, any time the > last record of the WAL file ends exactly at a page boundary. I think > we're gonna need a quick 7.3.4 ... > > If you want a source patch for 7.3.3, here it is. Thanks! ----------- Hannu
I ran into this too. Patched the code with Tom's change and it works fine. Thanks again Tom! Richard Schilling On 2003.07.17 11:04 Hannu Krosing wrote: > Tom Lane kirjutas N, 17.07.2003 kell 19:49: > > Ugh. The reason we hadn't seen this happen in the field was that it is > > a bug I introduced in a patch two months ago :-( > > > > 7.3.3 will in fact fail to start up, with the above error, any time the > > last record of the WAL file ends exactly at a page boundary. I think > > we're gonna need a quick 7.3.4 ... > > > > If you want a source patch for 7.3.3, here it is. > > Thanks! > > ----------- > Hannu > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >