Thread: Fix calculations on WAL recycling.

Fix calculations on WAL recycling.

From
Kyotaro HORIGUCHI
Date:
I'll register this to the next commitfest.

https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyotaro@lab.ntt.co.jp

> While considering this, I found a bug in 4b0d28de06, which
> removed prior checkpoint from control file. It actually trims the
> segments before the last checkpoint's redo segment but recycling
> is still considered based on the *prevous* checkpoint. As the
> result min_wal_size doesn't work as told.  Specifically, setting
> min/max_wal_size to 48MB and advance four or more segments then
> two checkpoints leaves just one segment, which is less than
> min_wal_size.
> 
> The attached patch fixes that. One arguable point on this would
> be the removal of the behavior when RemoveXLogFile(name,
> InvalidXLogRecPtr, ..).
> 
> The only place calling the function with the parameter is
> timeline switching. Previously unconditionally 10 segments are
> recycled after switchpoint but the reason for the behavior is we
> didn't have the information on previous checkpoint at hand at the
> time. But now we can use the timeline switch point as the
> approximate of the last checkpoint's redo point and this allows
> us to use min/max_wal_size properly at the time.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

From f2b1a0b6360263d4ddf725075daf4b56800e3e18 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 19 Jul 2018 12:13:56 +0900
Subject: [PATCH] Fix calculation base of WAL recycling

The commit 4b0d28de06 removed the prior checkpoint and related things
but that leaves WAL recycling based on the prior checkpoint. This
makes max_wal_size and min_wal_size work incorrectly. This patch makes
WAL recycling be based on the last checkpoint.
---
 src/backend/access/transam/xlog.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4049deb968..d7a61af8f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2287,7 +2287,7 @@ assign_checkpoint_completion_target(double newval, void *extra)
  * XLOG segments? Returns the highest segment that should be preallocated.
  */
 static XLogSegNo
-XLOGfileslop(XLogRecPtr PriorRedoPtr)
+XLOGfileslop(XLogRecPtr RedoRecPtr)
 {
     XLogSegNo    minSegNo;
     XLogSegNo    maxSegNo;
@@ -2299,9 +2299,9 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
      * correspond to. Always recycle enough segments to meet the minimum, and
      * remove enough segments to stay below the maximum.
      */
-    minSegNo = PriorRedoPtr / wal_segment_size +
+    minSegNo = RedoRecPtr / wal_segment_size +
         ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
-    maxSegNo = PriorRedoPtr / wal_segment_size +
+    maxSegNo = RedoRecPtr / wal_segment_size +
         ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
 
     /*
@@ -2316,7 +2316,7 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
     /* add 10% for good measure. */
     distance *= 1.10;
 
-    recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) /
+    recycleSegNo = (XLogSegNo) ceil(((double) RedoRecPtr + distance) /
                                     wal_segment_size);
 
     if (recycleSegNo < minSegNo)
@@ -3896,12 +3896,12 @@ RemoveTempXlogFiles(void)
 /*
  * Recycle or remove all log files older or equal to passed segno.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
  */
 static void
-RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 {
     DIR           *xldir;
     struct dirent *xlde;
@@ -3944,7 +3944,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
                 /* Update the last removed location in shared memory first */
                 UpdateLastRemovedPtr(xlde->d_name);
 
-                RemoveXlogFile(xlde->d_name, PriorRedoPtr, endptr);
+                RemoveXlogFile(xlde->d_name, RedoRecPtr, endptr);
             }
         }
     }
@@ -4006,9 +4006,11 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
              * remove it yet. It should be OK to remove it - files that are
              * not part of our timeline history are not required for recovery
              * - but seems safer to let them be archived and removed later.
+             * Here, switchpoint is a good approximate of RedoRecPtr for
+             * RemoveXlogFile since we have just done timeline switching.
              */
             if (!XLogArchiveIsReady(xlde->d_name))
-                RemoveXlogFile(xlde->d_name, InvalidXLogRecPtr, switchpoint);
+                RemoveXlogFile(xlde->d_name, switchpoint, switchpoint);
         }
     }
 
@@ -4018,14 +4020,12 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 /*
  * Recycle or remove a log file that's no longer needed.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
- * If PriorRedoRecPtr is not known, pass invalid, and the function will
- * recycle, somewhat arbitrarily, 10 future segments.
  */
 static void
-RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 {
     char        path[MAXPGPATH];
 #ifdef WIN32
@@ -4039,10 +4039,7 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
      * Initialize info about where to try to recycle to.
      */
     XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
-    if (PriorRedoPtr == InvalidXLogRecPtr)
-        recycleSegNo = endlogSegNo + 10;
-    else
-        recycleSegNo = XLOGfileslop(PriorRedoPtr);
+    recycleSegNo = XLOGfileslop(RedoRecPtr);
 
     snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
 
@@ -9057,7 +9054,7 @@ CreateCheckPoint(int flags)
         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
         KeepLogSeg(recptr, &_logSegNo);
         _logSegNo--;
-        RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, recptr);
+        RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
     }
 
     /*
@@ -9410,7 +9407,7 @@ CreateRestartPoint(int flags)
         if (RecoveryInProgress())
             ThisTimeLineID = replayTLI;
 
-        RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, endptr);
+        RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);
 
         /*
          * Make more log segments if needed.  (Do this after recycling old log
-- 
2.16.3


Re: Fix calculations on WAL recycling.

From
Michael Paquier
Date:
On Mon, Jul 23, 2018 at 01:57:48PM +0900, Kyotaro HORIGUCHI wrote:
> I'll register this to the next commitfest.
>
> https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyotaro@lab.ntt.co.jp

This is an open item for v11.

>> While considering this, I found a bug in 4b0d28de06, which
>> removed prior checkpoint from control file. It actually trims the
>> segments before the last checkpoint's redo segment but recycling
>> is still considered based on the *prevous* checkpoint. As the
>> result min_wal_size doesn't work as told.  Specifically, setting
>> min/max_wal_size to 48MB and advance four or more segments then
>> two checkpoints leaves just one segment, which is less than
>> min_wal_size.
>>
>> The attached patch fixes that. One arguable point on this would
>> be the removal of the behavior when RemoveXLogFile(name,
>> InvalidXLogRecPtr, ..).
>>
>> The only place calling the function with the parameter is
>> timeline switching. Previously unconditionally 10 segments are
>> recycled after switchpoint but the reason for the behavior is we
>> didn't have the information on previous checkpoint at hand at the
>> time. But now we can use the timeline switch point as the
>> approximate of the last checkpoint's redo point and this allows
>> us to use min/max_wal_size properly at the time.

I have been looking at that, and tested with this simple scenario:
create table aa (a int);

Then just repeat the following SQLs:
insert into aa values (1);
select pg_switch_wal();
checkpoint;
select pg_walfile_name(pg_current_wal_lsn());

By doing so, you would notice that the oldest WAL segment does not get
recycled after the checkpoint, while it should as the redo pointer is
always checkpoint generated.  What happens is that this oldest segment
gets recycled every two checkpoints.

Then I had a look at the proposed patch, with a couple of comments.

-   if (PriorRedoPtr == InvalidXLogRecPtr)
-       recycleSegNo = endlogSegNo + 10;
-   else
-       recycleSegNo = XLOGfileslop(PriorRedoPtr);
+   recycleSegNo = XLOGfileslop(RedoRecPtr);
I think that this is a new behavior, and should not be changed, see
point 3 below.

In CreateCheckPoint(), the removal of past WAL segments is always going
to happen as RedoRecPtr is never InvalidXLogRecPtr, so the recycling
should happen also when PriorRedoPtr is InvalidXLogRecPtr, no?

/* Trim from the last checkpoint, not the last - 1 */
This comment could be adjusted, let's remove "not the last - 1" .

The calculation of _logSegNo in CreateRestartPoint is visibly incorrect,
this still uses PriorRedoPtr so the bug is not fixed for standbys.  The
tweaks for ThisTimeLineID also need to be out of the loop where
PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation
should be kept.

Finally, in summary, this patch is doing actually three things:
1) Rename a couple of variables which refer incorrectly to the prior
checkpoint so as they refer to the last checkpoint generated.
2) Make sure that WAL recycling/removal happens based on the last
checkpoint generated, which is the one just generated when past WAL
segments are cleaned up as post-operation actions.
3) Enforce the recycling point to be the switch point instead of
arbitrarily recycle 10 segments, which is what b2a5545b has introduced.
Recycling at exactly the switch point is wrong, as the redo LSN of the
previous checkpoint may not be at the same segment when a timeline has
switched, so you would finish with recycling segments which are still
needed if an instance needs be crash-recovered post-promotion without
a completed post-recovery checkpoint.  In short this is dangerous.
I'll let Heikki comment on that, but I think that you should fetch the
last redo LSN instead, still you need to be worried about checkpoints
running in parallel of the startup process, so I would stick with the
current logic.

1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is
not.
--
Michael

Attachment

Re: Fix calculations on WAL recycling.

From
Kyotaro HORIGUCHI
Date:
At Mon, 23 Jul 2018 15:59:16 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180723065916.GI2854@paquier.xyz>
> On Mon, Jul 23, 2018 at 01:57:48PM +0900, Kyotaro HORIGUCHI wrote:
> > I'll register this to the next commitfest.
> > 
> > https://www.postgresql.org/message-id/20180719.125117.155470938.horiguchi.kyotaro@lab.ntt.co.jp
> 
> This is an open item for v11.

Mmm. Thanks. I wrongly thought this was v10 item. Removed this
from the next CF.

Thank you for taking a look.

> >> While considering this, I found a bug in 4b0d28de06, which
> >> removed prior checkpoint from control file. It actually trims the
> >> segments before the last checkpoint's redo segment but recycling
> >> is still considered based on the *prevous* checkpoint. As the
> >> result min_wal_size doesn't work as told.  Specifically, setting
> >> min/max_wal_size to 48MB and advance four or more segments then
> >> two checkpoints leaves just one segment, which is less than
> >> min_wal_size.
> >> 
> >> The attached patch fixes that. One arguable point on this would
> >> be the removal of the behavior when RemoveXLogFile(name,
> >> InvalidXLogRecPtr, ..).
> >> 
> >> The only place calling the function with the parameter is
> >> timeline switching. Previously unconditionally 10 segments are
> >> recycled after switchpoint but the reason for the behavior is we
> >> didn't have the information on previous checkpoint at hand at the
> >> time. But now we can use the timeline switch point as the
> >> approximate of the last checkpoint's redo point and this allows
> >> us to use min/max_wal_size properly at the time.
> 
> I have been looking at that, and tested with this simple scenario:
> create table aa (a int);
> 
> Then just repeat the following SQLs: 
> insert into aa values (1);
> select pg_switch_wal();
> checkpoint;
> select pg_walfile_name(pg_current_wal_lsn());
> 
> By doing so, you would notice that the oldest WAL segment does not get
> recycled after the checkpoint, while it should as the redo pointer is
> always checkpoint generated.  What happens is that this oldest segment
> gets recycled every two checkpoints.

(I'm not sure I'm getting it correctly..) I see the old segments
recycled. When I see pg_switch_wal() returns 0/12..,
pg_walfile_name is ...13 and segment files for 13 and 14 are
found in pg_wal directory. That is, seg 12 was recycled as seg
14. log_checkpoint=on shows every checkpoint recycles 1 segment
in the case.

> Then I had a look at the proposed patch, with a couple of comments.
> 
> -   if (PriorRedoPtr == InvalidXLogRecPtr)
> -       recycleSegNo = endlogSegNo + 10;
> -   else
> -       recycleSegNo = XLOGfileslop(PriorRedoPtr);
> +   recycleSegNo = XLOGfileslop(RedoRecPtr);
> I think that this is a new behavior, and should not be changed, see
> point 3 below.
> 
> In CreateCheckPoint(), the removal of past WAL segments is always going
> to happen as RedoRecPtr is never InvalidXLogRecPtr, so the recycling
> should happen also when PriorRedoPtr is InvalidXLogRecPtr, no?

Yes. The reason for the change was the change of
RemoveNonParentXlogFiles that I made and it is irrelevant to this
bug fix (and rather breaking as you pointed..). So, reverted it.


> /* Trim from the last checkpoint, not the last - 1 */
> This comment could be adjusted, let's remove "not the last - 1" .

Oops! Thanks. The comment has finally vanished and melded into
another comment just above.

| * Delete old log files not required by the last checkpoint and recycle
| * them


> The calculation of _logSegNo in CreateRestartPoint is visibly incorrect,
> this still uses PriorRedoPtr so the bug is not fixed for standbys.  The
> tweaks for ThisTimeLineID also need to be out of the loop where
> PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation
> should be kept.

Agreed.  While I reconsidered this, I noticed that the estimated
checkpoint distance is 0 when PriorRedoPtr is invalid. This means
that the first checkpoint/restartpoint tries to reduce WAL
segments to min_wal_size. However, it happens only at initdb time
and makes no substantial behavior change so I made the change
ignoring the difference.

> Finally, in summary, this patch is doing actually three things:
> 1) Rename a couple of variables which refer incorrectly to the prior
> checkpoint so as they refer to the last checkpoint generated.
> 2) Make sure that WAL recycling/removal happens based on the last
> checkpoint generated, which is the one just generated when past WAL
> segments are cleaned up as post-operation actions.
> 3) Enforce the recycling point to be the switch point instead of
> arbitrarily recycle 10 segments, which is what b2a5545b has introduced.
> Recycling at exactly the switch point is wrong, as the redo LSN of the
> previous checkpoint may not be at the same segment when a timeline has
> switched, so you would finish with recycling segments which are still
> needed if an instance needs be crash-recovered post-promotion without
> a completed post-recovery checkpoint.  In short this is dangerous.
> I'll let Heikki comment on that, but I think that you should fetch the
> last redo LSN instead, still you need to be worried about checkpoints
> running in parallel of the startup process, so I would stick with the
> current logic.

Thank you for the detail. I was coufused a bit there. I agree to
preserve the logic, too.

> 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is
> not.

Thank you for the comments and suggestions. After all, I did the
following things in the attached patch.

- Reverted the change on timeline switching. (Removed the (3))

- Fixed CreateRestartPoint to do the same thing with CreateCheckPoint.

- Both CreateRestart/CheckPoint now tries trimming of WAL
  segments even for the first time.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 662ecab9c2d0efc41756d98e18dbabe060da8d96 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 19 Jul 2018 12:13:56 +0900
Subject: [PATCH] Fix calculation base of WAL recycling

The commit 4b0d28de06 removed the prior checkpoint and related things
but that leaves WAL recycling based on the prior checkpoint. This
makes max_wal_size and min_wal_size work incorrectly. This patch makes
WAL recycling be based on the last checkpoint.
---
 src/backend/access/transam/xlog.c | 159 ++++++++++++++++++--------------------
 1 file changed, 75 insertions(+), 84 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 335b4a573d..05f750253f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2287,7 +2287,7 @@ assign_checkpoint_completion_target(double newval, void *extra)
  * XLOG segments? Returns the highest segment that should be preallocated.
  */
 static XLogSegNo
-XLOGfileslop(XLogRecPtr PriorRedoPtr)
+XLOGfileslop(XLogRecPtr RedoRecPtr)
 {
     XLogSegNo    minSegNo;
     XLogSegNo    maxSegNo;
@@ -2299,9 +2299,9 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
      * correspond to. Always recycle enough segments to meet the minimum, and
      * remove enough segments to stay below the maximum.
      */
-    minSegNo = PriorRedoPtr / wal_segment_size +
+    minSegNo = RedoRecPtr / wal_segment_size +
         ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
-    maxSegNo = PriorRedoPtr / wal_segment_size +
+    maxSegNo = RedoRecPtr / wal_segment_size +
         ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
 
     /*
@@ -2316,7 +2316,7 @@ XLOGfileslop(XLogRecPtr PriorRedoPtr)
     /* add 10% for good measure. */
     distance *= 1.10;
 
-    recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) /
+    recycleSegNo = (XLogSegNo) ceil(((double) RedoRecPtr + distance) /
                                     wal_segment_size);
 
     if (recycleSegNo < minSegNo)
@@ -3899,12 +3899,12 @@ RemoveTempXlogFiles(void)
 /*
  * Recycle or remove all log files older or equal to passed segno.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
  */
 static void
-RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 {
     DIR           *xldir;
     struct dirent *xlde;
@@ -3947,7 +3947,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
                 /* Update the last removed location in shared memory first */
                 UpdateLastRemovedPtr(xlde->d_name);
 
-                RemoveXlogFile(xlde->d_name, PriorRedoPtr, endptr);
+                RemoveXlogFile(xlde->d_name, RedoRecPtr, endptr);
             }
         }
     }
@@ -4021,14 +4021,14 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 /*
  * Recycle or remove a log file that's no longer needed.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
- * If PriorRedoRecPtr is not known, pass invalid, and the function will
- * recycle, somewhat arbitrarily, 10 future segments.
+ * If RedoRecPtr is not known, pass invalid, and the function will recycle,
+ * somewhat arbitrarily, 10 future segments.
  */
 static void
-RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 {
     char        path[MAXPGPATH];
 #ifdef WIN32
@@ -4042,10 +4042,10 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
      * Initialize info about where to try to recycle to.
      */
     XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
-    if (PriorRedoPtr == InvalidXLogRecPtr)
+    if (RedoRecPtr == InvalidXLogRecPtr)
         recycleSegNo = endlogSegNo + 10;
     else
-        recycleSegNo = XLOGfileslop(PriorRedoPtr);
+        recycleSegNo = XLOGfileslop(RedoRecPtr);
 
     snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
 
@@ -8706,6 +8706,7 @@ CreateCheckPoint(int flags)
     bool        shutdown;
     CheckPoint    checkPoint;
     XLogRecPtr    recptr;
+    XLogSegNo    _logSegNo;
     XLogCtlInsert *Insert = &XLogCtl->Insert;
     uint32        freespace;
     XLogRecPtr    PriorRedoPtr;
@@ -9072,22 +9073,18 @@ CreateCheckPoint(int flags)
      */
     smgrpostckpt();
 
-    /*
-     * Delete old log files and recycle them
-     */
+    /* Update the average distance between checkpoints/restartpoints. */
     if (PriorRedoPtr != InvalidXLogRecPtr)
-    {
-        XLogSegNo    _logSegNo;
-
-        /* Update the average distance between checkpoints. */
         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
 
-        /* Trim from the last checkpoint, not the last - 1 */
-        XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
-        KeepLogSeg(recptr, &_logSegNo);
-        _logSegNo--;
-        RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, recptr);
-    }
+    /*
+     * Delete old log files (those no longer needed for last checkpoint to
+     * prevent the disk holding the xlog from growing full.
+     */
+    XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
+    KeepLogSeg(recptr, &_logSegNo);
+    _logSegNo--;
+    RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
     /*
      * Make more log segments if needed.  (Do this after recycling old log
@@ -9253,6 +9250,11 @@ CreateRestartPoint(int flags)
     XLogRecPtr    lastCheckPointEndPtr;
     CheckPoint    lastCheckPoint;
     XLogRecPtr    PriorRedoPtr;
+    XLogRecPtr    receivePtr;
+    XLogRecPtr    replayPtr;
+    TimeLineID    replayTLI;
+    XLogRecPtr    endptr;
+    XLogSegNo    _logSegNo;
     TimestampTz xtime;
 
     /*
@@ -9394,69 +9396,58 @@ CreateRestartPoint(int flags)
     }
     LWLockRelease(ControlFileLock);
 
-    /*
-     * Delete old log files (those no longer needed even for previous
-     * checkpoint/restartpoint) to prevent the disk holding the xlog from
-     * growing full.
-     */
+    /* Update the average distance between checkpoints/restartpoints. */
     if (PriorRedoPtr != InvalidXLogRecPtr)
-    {
-        XLogRecPtr    receivePtr;
-        XLogRecPtr    replayPtr;
-        TimeLineID    replayTLI;
-        XLogRecPtr    endptr;
-        XLogSegNo    _logSegNo;
-
-        /* Update the average distance between checkpoints/restartpoints. */
         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
 
-        XLByteToSeg(PriorRedoPtr, _logSegNo, wal_segment_size);
+    /*
+     * Delete old log files (those no longer needed for last restartpoint to
+     * prevent the disk holding the xlog from growing full.
+     */
+    XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
 
-        /*
-         * Get the current end of xlog replayed or received, whichever is
-         * later.
-         */
-        receivePtr = GetWalRcvWriteRecPtr(NULL, NULL);
-        replayPtr = GetXLogReplayRecPtr(&replayTLI);
-        endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
+    /*
+     * Retreat _logSegNo using the current end of xlog replayed or received,
+     * whichever is later.
+     */
+    receivePtr = GetWalRcvWriteRecPtr(NULL, NULL);
+    replayPtr = GetXLogReplayRecPtr(&replayTLI);
+    endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
+    KeepLogSeg(endptr, &_logSegNo);
+    _logSegNo--;
+    
+    /*
+     * Try to recycle segments on a useful timeline. If we've been promoted
+     * since the beginning of this restartpoint, use the new timeline chosen
+     * at end of recovery (RecoveryInProgress() sets ThisTimeLineID in that
+     * case). If we're still in recovery, use the timeline we're currently
+     * replaying.
+     *
+     * There is no guarantee that the WAL segments will be useful on the
+     * current timeline; if recovery proceeds to a new timeline right after
+     * this, the pre-allocated WAL segments on this timeline will not be used,
+     * and will go wasted until recycled on the next restartpoint. We'll live
+     * with that.
+     */
+    if (RecoveryInProgress())
+        ThisTimeLineID = replayTLI;
 
-        KeepLogSeg(endptr, &_logSegNo);
-        _logSegNo--;
+    RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);
 
-        /*
-         * Try to recycle segments on a useful timeline. If we've been
-         * promoted since the beginning of this restartpoint, use the new
-         * timeline chosen at end of recovery (RecoveryInProgress() sets
-         * ThisTimeLineID in that case). If we're still in recovery, use the
-         * timeline we're currently replaying.
-         *
-         * There is no guarantee that the WAL segments will be useful on the
-         * current timeline; if recovery proceeds to a new timeline right
-         * after this, the pre-allocated WAL segments on this timeline will
-         * not be used, and will go wasted until recycled on the next
-         * restartpoint. We'll live with that.
-         */
-        if (RecoveryInProgress())
-            ThisTimeLineID = replayTLI;
+    /*
+     * Make more log segments if needed.  (Do this after recycling old log
+     * segments, since that may supply some of the needed files.)
+     */
+    PreallocXlogFiles(endptr);
 
-        RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, endptr);
-
-        /*
-         * Make more log segments if needed.  (Do this after recycling old log
-         * segments, since that may supply some of the needed files.)
-         */
-        PreallocXlogFiles(endptr);
-
-        /*
-         * ThisTimeLineID is normally not set when we're still in recovery.
-         * However, recycling/preallocating segments above needed
-         * ThisTimeLineID to determine which timeline to install the segments
-         * on. Reset it now, to restore the normal state of affairs for
-         * debugging purposes.
-         */
-        if (RecoveryInProgress())
-            ThisTimeLineID = 0;
-    }
+    /*
+     * ThisTimeLineID is normally not set when we're still in recovery.
+     * However, recycling/preallocating segments above needed ThisTimeLineID
+     * to determine which timeline to install the segments on. Reset it now,
+     * to restore the normal state of affairs for debugging purposes.
+     */
+    if (RecoveryInProgress())
+        ThisTimeLineID = 0;
 
     /*
      * Truncate pg_subtrans if possible.  We can throw away all data before
-- 
2.16.3


Re: Fix calculations on WAL recycling.

From
Michael Paquier
Date:
On Mon, Jul 23, 2018 at 07:55:57PM +0900, Kyotaro HORIGUCHI wrote:
> At Mon, 23 Jul 2018 15:59:16 +0900, Michael Paquier <michael@paquier.xyz> wrote in
<20180723065916.GI2854@paquier.xyz>
>> This is an open item for v11.
>
> Mmm. Thanks. I wrongly thought this was v10 item. Removed this
> from the next CF.

Thanks for updating the CF app.

>> By doing so, you would notice that the oldest WAL segment does not get
>> recycled after the checkpoint, while it should as the redo pointer is
>> always checkpoint generated.  What happens is that this oldest segment
>> gets recycled every two checkpoints.
>
> (I'm not sure I'm getting it correctly..) I see the old segments
> recycled. When I see pg_switch_wal() returns 0/12..,
> pg_walfile_name is ...13 and segment files for 13 and 14 are
> found in pg_wal directory. That is, seg 12 was recycled as seg
> 14. log_checkpoint=on shows every checkpoint recycles 1 segment
> in the case.

With your patch applied I see one segment recycled after each
checkpoint, which is correct.  On HEAD, you would see no segments,
followed by 2 segments recycled.  But I also see sometimes one segment
recycled.

>> The calculation of _logSegNo in CreateRestartPoint is visibly incorrect,
>> this still uses PriorRedoPtr so the bug is not fixed for standbys.  The
>> tweaks for ThisTimeLineID also need to be out of the loop where
>> PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation
>> should be kept.
>
> Agreed.  While I reconsidered this, I noticed that the estimated
> checkpoint distance is 0 when PriorRedoPtr is invalid. This means
> that the first checkpoint/restartpoint tries to reduce WAL
> segments to min_wal_size. However, it happens only at initdb time
> and makes no substantial behavior change so I made the change
> ignoring the difference.

Yes, same analysis here.

>> 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is
>> not.
>
> Thank you for the comments and suggestions. After all, I did the
> following things in the attached patch.
>
> - Reverted the change on timeline switching. (Removed the (3))
> - Fixed CreateRestartPoint to do the same thing with CreateCheckPoint.
> - Both CreateRestart/CheckPoint now tries trimming of WAL
>   segments even for the first time.

Thanks, pushed after some comment tweaks and fixing the variable names
at the top of xlog.c for the static declarations.  Perhaps we can do
more refactoring here by moving all the segment calculation logic
directly in RemoveOldXlogFiles, but that makes the end LSN calculation a
bit blurry so I kept things as you proposed in version 3 of the patch.
--
Michael

Attachment

Re: Fix calculations on WAL recycling.

From
Kyotaro HORIGUCHI
Date:
At Tue, 24 Jul 2018 10:41:18 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180724014118.GA4035@paquier.xyz>
> On Mon, Jul 23, 2018 at 07:55:57PM +0900, Kyotaro HORIGUCHI wrote:
> With your patch applied I see one segment recycled after each
> checkpoint, which is correct.  On HEAD, you would see no segments,
> followed by 2 segments recycled.  But I also see sometimes one segment
> recycled.

Anyway good to hear that.

> >> The calculation of _logSegNo in CreateRestartPoint is visibly incorrect,
> >> this still uses PriorRedoPtr so the bug is not fixed for standbys.  The
> >> tweaks for ThisTimeLineID also need to be out of the loop where
> >> PriorRedoPtr is InvalidXLogRecPtr, and only the distance calculation
> >> should be kept.
> > 
> > Agreed.  While I reconsidered this, I noticed that the estimated
> > checkpoint distance is 0 when PriorRedoPtr is invalid. This means
> > that the first checkpoint/restartpoint tries to reduce WAL
> > segments to min_wal_size. However, it happens only at initdb time
> > and makes no substantial behavior change so I made the change
> > ignoring the difference.
> 
> Yes, same analysis here.
> 
> >> 1) and 2) are in my opinion clearly oversights from 4b0d28d, but 3) is
> >> not.
> > 
> > Thank you for the comments and suggestions. After all, I did the
> > following things in the attached patch.
> > 
> > - Reverted the change on timeline switching. (Removed the (3))
> > - Fixed CreateRestartPoint to do the same thing with CreateCheckPoint.
> > - Both CreateRestart/CheckPoint now tries trimming of WAL
> >   segments even for the first time.
> 
> Thanks, pushed after some comment tweaks and fixing the variable names
> at the top of xlog.c for the static declarations.  Perhaps we can do
> more refactoring here by moving all the segment calculation logic
> directly in RemoveOldXlogFiles, but that makes the end LSN calculation a
> bit blurry so I kept things as you proposed in version 3 of the patch.

Thank you for committing this.

I feel that XLOGfileslop() can be a function but I regret that I
didn't move it closer to RemoveOldXLogFiles a bit..


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center