Home > mailing lists
Re: BUG #4879: bgwriter fails to fsync the file in recovery mode - Mailing list pgsql-bugs

From	Heikki Linnakangas
Subject	Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
Date	June 26, 2009 08:05:48
Msg-id	4A44A9A3.1060902@enterprisedb.com Whole thread Raw
In response to	Re: BUG #4879: bgwriter fails to fsync the file in recovery mode (Simon Riggs <simon@2ndQuadrant.com>)
Responses	Re: BUG #4879: bgwriter fails to fsync the file in recovery mode Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
List	pgsql-bugs
Tree view
Simon Riggs wrote:
> On Fri, 2009-06-26 at 05:14 +0100, Simon Riggs wrote:
>> On Thu, 2009-06-25 at 20:25 -0400, Tom Lane wrote:
>
>>> What I am thinking is that instead of the hack of clearing
>>> LocalRecoveryInProgress to allow the current process to write WAL,
>>> we should have a separate test function WALWriteAllowed() with a
>>> state variable LocalWALWriteAllowed, and the hack should set that
>>> state without playing any games with LocalRecoveryInProgress.  Then
>>> RecoveryInProgress() remains true during the end-of-recovery checkpoint
>>> and we can condition the TruncateMultiXact and TruncateSUBTRANS calls
>>> on that.  Meanwhile the various places that check RecoveryInProgress
>>> to decide if WAL writing is allowed should call WALWriteAllowed()
>>> instead.
>> No need.
>
> Belay that. Yes, agree need for additional state, though think its more
> like EndRecoveryIsComplete().

Here's a patch implementing the WALWriteAllowed() idea (I'm not wedded
to the name). There's two things that trouble me with this patch:

- CreateCheckPoint() calls AdvanceXLInsertBuffer() before setting
LocalWALWriteAllowed. I don't see anything in AdvanceXLInsertBuffer()
that would fail, but it doesn't feel right. While strictly speaking it
doesn't insert new WAL records, it does write WAL page headers.

- As noted with an XXX comment in the patch, CreateCheckPoint() now
resets LocalWALWriteAllowed to false after a shutdown/end-of-recovery
checkpoint. But that's not enough to stop WAL inserts after a shutdown
checkpoint, because when RecoveryInProgress() is false, we
WALWriteAllowed() still returns true. We haven't had such a safeguard in
place before, so we can keep living without it, but now that we have a
WALWriteAllowed() macro it would be nice if it returned false when WAL
writes are no longer allowed after a shutdown checkpoint. (that would've
caught a bug in Guillaume Smet's original patch to rotate a WAL segment
at shutdown, where the xlog switch was done after shutdown checkpoint)

On whole, this is probably the right way going forward, but I'm not sure
if it'd make 8.4 more or less robust than what's in CVS now.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7314341..6f86961 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1543,7 +1543,7 @@ CheckPointMultiXact(void)
      * SimpleLruTruncate would get confused.  It seems best not to risk
      * removing any data during recovery anyway, so don't truncate.
      */
-    if (!InRecovery)
+    if (!RecoveryInProgress())
         TruncateMultiXact();

     TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d6f63c7..f28bd4d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -142,6 +142,16 @@ static bool InArchiveRecovery = false;
  */
 static bool LocalRecoveryInProgress = true;

+/*
+ * Am I allowed to write new WAL records? It's always allowed after recovery.
+ * End-of-recovery checkpoint sets LocalWALWriteAllowed before we exit
+ * recovery, so that it can write the checkpoint record even though
+ * RecoveryInProgress() is still true.
+ */
+#define WALWriteAllowed() (LocalWALWriteAllowed || !RecoveryInProgress())
+
+static bool LocalWALWriteAllowed = false;
+
 /* Was the last xlog file restored from archive, or local? */
 static bool restoredFromArchive = false;

@@ -537,7 +547,7 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
     bool        isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH);

     /* cross-check on whether we should be here or not */
-    if (RecoveryInProgress())
+    if (!WALWriteAllowed())
         elog(FATAL, "cannot make new WAL entries during recovery");

     /* info's high bits are reserved for use by me */
@@ -1846,7 +1856,7 @@ XLogFlush(XLogRecPtr record)
      * During REDO, we don't try to flush the WAL, but update minRecoveryPoint
      * instead.
      */
-    if (RecoveryInProgress())
+    if (!WALWriteAllowed())
     {
         UpdateMinRecoveryPoint(record, false);
         return;
@@ -1949,7 +1959,7 @@ XLogFlush(XLogRecPtr record)
      * and so we will not force a restart for a bad LSN on a data page.
      */
     if (XLByteLT(LogwrtResult.Flush, record))
-        elog(InRecovery ? WARNING : ERROR,
+        elog(RecoveryInProgress() ? WARNING : ERROR,
         "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
              record.xlogid, record.xrecoff,
              LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
@@ -1977,7 +1987,7 @@ XLogBackgroundFlush(void)
     bool        flexible = true;

     /* XLOG doesn't need flushing during recovery */
-    if (RecoveryInProgress())
+    if (!WALWriteAllowed())
         return;

     /* read LogwrtResult and update local state */
@@ -5849,7 +5859,10 @@ RecoveryInProgress(void)
          * recovery is finished.
          */
         if (!LocalRecoveryInProgress)
+        {
             InitXLOGAccess();
+            LocalWALWriteAllowed = true;
+        }

         return LocalRecoveryInProgress;
     }
@@ -6225,7 +6238,6 @@ CreateCheckPoint(int flags)
     uint32        _logSeg;
     TransactionId *inCommitXids;
     int            nInCommit;
-    bool        OldInRecovery = InRecovery;

     /*
      * An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -6236,33 +6248,9 @@ CreateCheckPoint(int flags)
     else
         shutdown = false;

-    /*
-     * A startup checkpoint is created before anyone else is allowed to
-     * write WAL. To allow us to write the checkpoint record, set
-     * LocalRecoveryInProgress to false. This lets us write WAL, but others
-     * are still not allowed to do so.
-     */
-    if (flags & CHECKPOINT_END_OF_RECOVERY)
-    {
-        Assert(RecoveryInProgress());
-        LocalRecoveryInProgress = false;
-        InitXLOGAccess();
-
-        /*
-         * Before 8.4, end-of-recovery checkpoints were always performed by
-         * the startup process, and InRecovery was set true. InRecovery is not
-         * normally set in bgwriter, but we set it here temporarily to avoid
-         * confusing old code in the end-of-recovery checkpoint code path that
-         * rely on it.
-         */
-        InRecovery = true;
-    }
-    else
-    {
-        /* shouldn't happen */
-        if (RecoveryInProgress())
-            elog(ERROR, "can't create a checkpoint during recovery");
-    }
+    /* shouldn't happen */
+    if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
+        elog(ERROR, "can't create a checkpoint during recovery");

     /*
      * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
@@ -6305,7 +6293,6 @@ CreateCheckPoint(int flags)

     /* Begin filling in the checkpoint WAL record */
     MemSet(&checkPoint, 0, sizeof(checkPoint));
-    checkPoint.ThisTimeLineID = ThisTimeLineID;
     checkPoint.time = (pg_time_t) time(NULL);

     /*
@@ -6473,6 +6460,24 @@ CreateCheckPoint(int flags)
     START_CRIT_SECTION();

     /*
+     * An end-of-recovery checkpoint is created before anyone is allowed to
+     * write WAL. To allow us to write the checkpoint record, temporarily
+     * enable LocalWALWriteAllowed.
+     */
+    if (flags & CHECKPOINT_END_OF_RECOVERY)
+    {
+        Assert(RecoveryInProgress());
+        LocalWALWriteAllowed = true;
+        InitXLOGAccess();
+    }
+
+    /*
+     * this needs to be done after the InitXLOGAccess() call above or
+     * ThisTimeLineID might be uninitialized
+     */
+    checkPoint.ThisTimeLineID = ThisTimeLineID;
+
+    /*
      * Now insert the checkpoint record into XLOG.
      */
     rdata.data = (char *) (&checkPoint);
@@ -6488,6 +6493,19 @@ CreateCheckPoint(int flags)
     XLogFlush(recptr);

     /*
+     * We mustn't write any new WAL after a shutdown checkpoint, or it will
+     * be overwritten at next startup. No-one should even try, this just
+     * allows a bit more sanity-checking. XXX: since RecoveryInProgress() is
+     * false at that point, we'll just set LocalWriteAllowed again if anyone
+     * calls XLogInsert(), so this is actually useless in the shutdown case.
+     *
+     * Don't allow WAL writes after an end-of-recovery checkpoint either. It
+     * will be enabled again after the startup is fully completed.
+     */
+    if (shutdown)
+        LocalWALWriteAllowed = false;
+
+    /*
      * We now have ProcLastRecPtr = start of actual checkpoint record, recptr
      * = end of actual checkpoint record.
      */
@@ -6560,7 +6578,7 @@ CreateCheckPoint(int flags)
      * in subtrans.c).    During recovery, though, we mustn't do this because
      * StartupSUBTRANS hasn't been called yet.
      */
-    if (!InRecovery)
+    if (!RecoveryInProgress())
         TruncateSUBTRANS(GetOldestXmin(true, false));

     /* All real work is done, but log before releasing lock. */
@@ -6574,9 +6592,6 @@ CreateCheckPoint(int flags)
                                      CheckpointStats.ckpt_segs_recycled);

     LWLockRelease(CheckpointLock);
-
-    /* Restore old value */
-    InRecovery = OldInRecovery;
 }

 /*
pgsql-bugs by date:
From: Fujii Masao
Date: 26 June 2009, 06:28:51
Subject: Re: BUG #4879: bgwriter fails to fsync the file in recovery mode
From: "Justin"
Date: 26 June 2009, 09:13:21
Subject: BUG #4886: Password Crash
Re: BUG #4879: bgwriter fails to fsync the file in recovery mode - Mailing list pgsql-bugs

Previous

Next