Thread: Non-emergency patch for bug #17679

Non-emergency patch for bug #17679

From
Tom Lane
Date:
In the release team's discussion leading up to commit 0e758ae89,
Andres opined that what commit 4ab5dae94 had done to mdunlinkfork
was a mess, and I concur.  It invented an entirely new code path
through that function, and required two different behaviors from the
segment-deletion loop.  I think a very straight line can be drawn
between that extra complexity and the introduction of a nasty bug.
It's all unnecessary too, because AFAICS all we really need is to
apply the pre-existing behavior for temp tables and REDO mode
to binary-upgrade mode as well.

Hence, the attached reverts everything 4ab5dae94 did to this function,
and most of 0e758ae89 too, and instead makes IsBinaryUpgrade an
additional reason to take the immediate-unlink path.

Barring objections, I'll push this after the release freeze lifts.

            regards, tom lane

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index b1a2cb575d..0f642df3c1 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -263,6 +263,12 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
  * The fact that temp rels and regular rels have different file naming
  * patterns provides additional safety.
  *
+ * We also don't do it while performing a binary upgrade.  There is no reuse
+ * hazard in that case, since after a crash or even a simple ERROR, the
+ * upgrade fails and the whole cluster must be recreated from scratch.
+ * Furthermore, it is important to remove the files from disk immediately,
+ * because we may be about to reuse the same relfilenumber.
+ *
  * All the above applies only to the relation's main fork; other forks can
  * just be removed immediately, since they are not needed to prevent the
  * relfilenumber from being recycled.  Also, we do not carefully
@@ -319,14 +325,15 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
 {
     char       *path;
     int            ret;
-    BlockNumber segno = 0;

     path = relpath(rlocator, forknum);

     /*
-     * Delete or truncate the first segment.
+     * Truncate and then unlink the first segment, or just register a request
+     * to unlink it later, as described in the comments for mdunlink().
      */
-    if (isRedo || forknum != MAIN_FORKNUM || RelFileLocatorBackendIsTemp(rlocator))
+    if (isRedo || IsBinaryUpgrade || forknum != MAIN_FORKNUM ||
+        RelFileLocatorBackendIsTemp(rlocator))
     {
         if (!RelFileLocatorBackendIsTemp(rlocator))
         {
@@ -351,7 +358,6 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
                 ereport(WARNING,
                         (errcode_for_file_access(),
                          errmsg("could not remove file \"%s\": %m", path)));
-            segno++;
         }
     }
     else
@@ -359,42 +365,25 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
         /* Prevent other backends' fds from holding on to the disk space */
         ret = do_truncate(path);

-        /*
-         * Except during a binary upgrade, register request to unlink first
-         * segment later, rather than now.
-         *
-         * If we're performing a binary upgrade, the dangers described in the
-         * header comments for mdunlink() do not exist, since after a crash or
-         * even a simple ERROR, the upgrade fails and the whole new cluster
-         * must be recreated from scratch. And, on the other hand, it is
-         * important to remove the files from disk immediately, because we may
-         * be about to reuse the same relfilenumber.
-         */
-        if (!IsBinaryUpgrade)
-        {
-            register_unlink_segment(rlocator, forknum, 0 /* first seg */ );
-            segno++;
-        }
+        /* Register request to unlink first segment later */
+        register_unlink_segment(rlocator, forknum, 0 /* first seg */ );
     }

     /*
-     * Delete any remaining segments (we might or might not have dealt with
-     * the first one above).
+     * Delete any additional segments.
      */
     if (ret >= 0)
     {
         char       *segpath = (char *) palloc(strlen(path) + 12);
+        BlockNumber segno;

         /*
          * Note that because we loop until getting ENOENT, we will correctly
          * remove all inactive segments as well as active ones.
          */
-        for (;; segno++)
+        for (segno = 1;; segno++)
         {
-            if (segno == 0)
-                strcpy(segpath, path);
-            else
-                sprintf(segpath, "%s.%u", path, segno);
+            sprintf(segpath, "%s.%u", path, segno);

             if (!RelFileLocatorBackendIsTemp(rlocator))
             {

Re: Non-emergency patch for bug #17679

From
Andres Freund
Date:
Hi,

On 2022-11-08 11:28:08 -0500, Tom Lane wrote:
> In the release team's discussion leading up to commit 0e758ae89,
> Andres opined that what commit 4ab5dae94 had done to mdunlinkfork
> was a mess, and I concur.  It invented an entirely new code path
> through that function, and required two different behaviors from the
> segment-deletion loop.  I think a very straight line can be drawn
> between that extra complexity and the introduction of a nasty bug.
> It's all unnecessary too, because AFAICS all we really need is to
> apply the pre-existing behavior for temp tables and REDO mode
> to binary-upgrade mode as well.

I'm not sure I understand the current code. In the binary upgrade case we
currently *do* truncate the file in the else of "Delete or truncate the first
segment.", then again truncate it in the loop and then unlink it, right?


> Hence, the attached reverts everything 4ab5dae94 did to this function,
> and most of 0e758ae89 too, and instead makes IsBinaryUpgrade an
> additional reason to take the immediate-unlink path.
> 
> Barring objections, I'll push this after the release freeze lifts.

I wonder if it's worth aiming slightly higher. There's plenty duplicated code
between the first segment handling and the loop body. Perhaps the if at the
top just should decide whether to unlink the first segment or not, and we then
check that in the body of the loop for segno == 0?

Greetings,

Andres Freund



Re: Non-emergency patch for bug #17679

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-11-08 11:28:08 -0500, Tom Lane wrote:
>> Hence, the attached reverts everything 4ab5dae94 did to this function,
>> and most of 0e758ae89 too, and instead makes IsBinaryUpgrade an
>> additional reason to take the immediate-unlink path.

> I wonder if it's worth aiming slightly higher. There's plenty duplicated code
> between the first segment handling and the loop body. Perhaps the if at the
> top just should decide whether to unlink the first segment or not, and we then
> check that in the body of the loop for segno == 0?

I don't care for that.  I think the point here is precisely that
we want behavior A for the first segment and behavior B for the
remaining ones, and so I'd prefer to keep the code that does A
and the code that does B distinct.  It was a misguided attempt to
share that code that got us into trouble here in the first place.
Moreover, any future changes to either behavior will be that much
harder if we combine the implementations.

            regards, tom lane