[BUG] Checkpointer on hot standby runs without looking checkpoint_segments - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Date
Msg-id 20120416.210548.254416486.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
Responses Re: [BUG] Checkpointer on hot standby runs without looking checkpoint_segments  (Simon Riggs <simon@2ndQuadrant.com>)
Re: [BUG] Checkpointer on hot standby runs without looking checkpoint_segments  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Hello, this is bug report and a patch for it.

The first patch in the attachments is for 9.2dev and next one is
for 9.1.3.

On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does
not check against WAL segments written. This makes checkpointer
always run at the speed according to checkpoint_timeout
regardless of WAL advancing rate.

This leads to unexpected imbalance in the numbers of WAL segment
files between the master and the standby(s) for high advance rate
of WALs. And what is worse, the master would have much higher
chance to remove some WAL segments before the standby receives
them.

XLogPageRead()@xlog.c triggers checkpoint referring to WAL
segment advance. So I think this is a bug of bgwriter in 9.1. The
attached patches fix that on 9.2dev and 9.1.3 respctively.

In the backported version to 9.1.3, bgwriter.c is modified
instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is
used as the equivalent of GetStandbyFlushRecPtr() in 9.2.

By the way, GetStandbyFlushRecPtr() acquires spin lock within. It
might be enough to read XLogCtl->recoveryLastRecPtr without lock
to make rough estimation, but I can't tell it is safe or
not. Same discussion could be for GetWalRcvWriteRecPtr() on
9.1.3.


However, it seems to work fine on a simple test.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index c9473f7..f86e9b9 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -491,8 +491,8 @@ CheckpointerMain(void)             * Initialize checkpointer-private variables used during
checkpoint.            */            ckpt_active = true;
 
-            if (!do_restartpoint)
-                ckpt_start_recptr = GetInsertRecPtr();
+            ckpt_start_recptr =
+                do_restartpoint ? GetStandbyFlushRecPtr() : GetInsertRecPtr();            ckpt_start_time = now;
    ckpt_cached_elapsed = 0;
 
@@ -731,28 +731,29 @@ IsCheckpointOnSchedule(double progress)        return false;    /*
-     * Check progress against WAL segments written and checkpoint_segments.
+     * Check progress against WAL segments written, or replayed for
+     * hot standby, and checkpoint_segments.     *     * We compare the current WAL insert location against the
location
-     * computed before calling CreateCheckPoint. The code in XLogInsert that
-     * actually triggers a checkpoint when checkpoint_segments is exceeded
-     * compares against RedoRecptr, so this is not completely accurate.
-     * However, it's good enough for our purposes, we're only calculating an
-     * estimate anyway.
+     * computed before calling CreateCheckPoint. The code in
+     * XLogInsert that actually triggers a checkpoint when
+     * checkpoint_segments is exceeded compares against RedoRecptr.
+     * Similarly, we consult WAL flush location instead on hot
+     * standbys and XLogPageRead compares it aganst RedoRecptr, too.
+     * Altough these are not completely accurate, it's good enough for
+     * our purposes, we're only calculating an estimate anyway.     */
-    if (!RecoveryInProgress())
+    
+    recptr = RecoveryInProgress() ? GetStandbyFlushRecPtr() : GetInsertRecPtr();
+    elapsed_xlogs =
+        (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
+         ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
+        CheckPointSegments;
+
+    if (progress < elapsed_xlogs)    {
-        recptr = GetInsertRecPtr();
-        elapsed_xlogs =
-            (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
-             ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
-            CheckPointSegments;
-
-        if (progress < elapsed_xlogs)
-        {
-            ckpt_cached_elapsed = elapsed_xlogs;
-            return false;
-        }
+        ckpt_cached_elapsed = elapsed_xlogs;
+        return false;    }    /*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 5643ec8..0ce9945 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -56,6 +56,7 @@#include "pgstat.h"#include "postmaster/bgwriter.h"#include "replication/syncrep.h"
+#include "replication/walreceiver.h"#include "storage/bufmgr.h"#include "storage/fd.h"#include "storage/ipc.h"
@@ -489,8 +490,8 @@ BackgroundWriterMain(void)             * Initialize bgwriter-private variables used during
checkpoint.            */            ckpt_active = true;
 
-            if (!do_restartpoint)
-                ckpt_start_recptr = GetInsertRecPtr();
+            ckpt_start_recptr =    do_restartpoint ?
+                GetWalRcvWriteRecPtr(NULL) : GetInsertRecPtr();            ckpt_start_time = now;
ckpt_cached_elapsed= 0;
 
@@ -764,30 +765,32 @@ IsCheckpointOnSchedule(double progress)        return false;    /*
-     * Check progress against WAL segments written and checkpoint_segments.
+     * Check progress against WAL segments written, or replayed for
+     * hot standby, and checkpoint_segments.     *     * We compare the current WAL insert location against the
location
-     * computed before calling CreateCheckPoint. The code in XLogInsert that
-     * actually triggers a checkpoint when checkpoint_segments is exceeded
-     * compares against RedoRecptr, so this is not completely accurate.
-     * However, it's good enough for our purposes, we're only calculating an
-     * estimate anyway.
+     * computed before calling CreateCheckPoint. The code in
+     * XLogInsert that actually triggers a checkpoint when
+     * checkpoint_segments is exceeded compares against RedoRecptr.
+     * Similarly, we consult WAL write location instead on hot
+     * standbys and XLogPageRead compares it aganst RedoRecptr, too.
+     * Altough these are not completely accurate, it's good enough for
+     * our purposes, we're only calculating an estimate anyway.     */
-    if (!RecoveryInProgress())
+    
+    recptr =
+        RecoveryInProgress() ? GetWalRcvWriteRecPtr(NULL) : GetInsertRecPtr();
+    elapsed_xlogs =
+        (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
+         ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
+        CheckPointSegments;
+    
+    if (progress < elapsed_xlogs)    {
-        recptr = GetInsertRecPtr();
-        elapsed_xlogs =
-            (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
-             ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
-            CheckPointSegments;
-
-        if (progress < elapsed_xlogs)
-        {
-            ckpt_cached_elapsed = elapsed_xlogs;
-            return false;
-        }
+        ckpt_cached_elapsed = elapsed_xlogs;
+        return false;    }
-
+        /*     * Check progress against time elapsed and checkpoint_timeout.     */

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: nodes/*funcs.c inconsistencies
Next
From: Simon Riggs
Date:
Subject: Re: [BUG] Checkpointer on hot standby runs without looking checkpoint_segments