Home > mailing lists

Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush - Mailing list pgsql-admin

From	Tom Lane
Subject	Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush
Date	February 2, 2010 16:34:32
Msg-id	29816.1265141665@sss.pgh.pa.us Whole thread
In response to	Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush ("Wang, Mary Y" <mary.y.wang@boeing.com>)
Responses	Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush
List	pgsql-admin

Tree view

"Wang, Mary Y" <mary.y.wang@boeing.com> writes:
> Thanks for the help
> I was able to find pg_resetxlog in the path.  Are there any precautions that I need to be aware of?  Or I just don't
haveany choice? 

I'd suggest taking a tarball copy of the $PGDATA tree, so you can at
least get back to where you were if it doesn't work.

Actually ... is this RHEL 2.1 as I suspect?  If so, can you find the
last RHEL 2.1 update, which was postgresql-7.1.3-7.rhel2.1AS?
There was a fix in that that might well address your issue:

* Wed Feb 23 2005 Tom Lane <tgl@redhat.com> 7.1.3-7.rhel2.1AS
- Back-patch community 7.2 change in error recovery behavior of XLogFlush;
  this allows successful restart of the database even in the presence of
  dubious values in LSN fields of database pages.

Alternatively, if you have the ability to rebuild the version you've
got, you're welcome to try adding the patch, which is attached.

            regards, tom lane

diff -Naur postgresql-7.1.3.orig/src/backend/access/transam/xlog.c postgresql-7.1.3/src/backend/access/transam/xlog.c
--- postgresql-7.1.3.orig/src/backend/access/transam/xlog.c    2001-08-16 14:36:37.000000000 -0400
+++ postgresql-7.1.3/src/backend/access/transam/xlog.c    2005-02-23 12:23:30.963333861 -0500
@@ -1242,14 +1242,42 @@
             WriteRqst.Flush = record;
             XLogWrite(WriteRqst);
             S_UNLOCK(&(XLogCtl->logwrt_lck));
-            if (XLByteLT(LogwrtResult.Flush, record))
-                elog(STOP, "XLogFlush: request is not satisfied");
             break;
         }
         S_LOCK_SLEEP(&(XLogCtl->logwrt_lck), spins++, XLOG_LOCK_TIMEOUT);
     }

     END_CRIT_SECTION();
+
+    /*
+     * If we still haven't flushed to the request point then we have a
+     * problem; most likely, the requested flush point is past end of
+     * XLOG. This has been seen to occur when a disk page has a corrupted
+     * LSN.
+     *
+     * Formerly we treated this as a PANIC condition, but that hurts the
+     * system's robustness rather than helping it: we do not want to take
+     * down the whole system due to corruption on one data page.  In
+     * particular, if the bad page is encountered again during recovery
+     * then we would be unable to restart the database at all!    (This
+     * scenario has actually happened in the field several times with 7.1
+     * releases. Note that we cannot get here while InRedo is true, but if
+     * the bad page is brought in and marked dirty during recovery then
+     * CreateCheckPoint will try to flush it at the end of recovery.)
+     *
+     * The current approach is to ERROR under normal conditions, but only
+     * NOTICE during recovery, so that the system can be brought up even
+     * if there's a corrupt LSN.  Note that for calls from xact.c, the
+     * ERROR will be promoted to PANIC since xact.c calls this routine
+     * inside a critical section.  However, calls from bufmgr.c are not
+     * within critical sections and so we will not force a restart for a
+     * bad LSN on a data page.
+     */
+    if (XLByteLT(LogwrtResult.Flush, record))
+        elog(InRecovery ? NOTICE : ERROR,
+             "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
+             record.xlogid, record.xrecoff,
+             LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
 }

 /*

pgsql-admin by date:

From: "Wang, Mary Y"
Date: 02 February 2010, 16:32:29
Subject: Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

From: Mark Corner
Date: 02 February 2010, 16:44:18
Subject: Re: Using pg_migrator to upgrade 8.3->8.4

Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush - Mailing list pgsql-admin

Previous

Next