Non-emergency patch for bug #17679 - Mailing list pgsql-hackers

From Tom Lane
Subject Non-emergency patch for bug #17679
Date
Msg-id 3797575.1667924888@sss.pgh.pa.us
Whole thread Raw
Responses Re: Non-emergency patch for bug #17679
List pgsql-hackers
In the release team's discussion leading up to commit 0e758ae89,
Andres opined that what commit 4ab5dae94 had done to mdunlinkfork
was a mess, and I concur.  It invented an entirely new code path
through that function, and required two different behaviors from the
segment-deletion loop.  I think a very straight line can be drawn
between that extra complexity and the introduction of a nasty bug.
It's all unnecessary too, because AFAICS all we really need is to
apply the pre-existing behavior for temp tables and REDO mode
to binary-upgrade mode as well.

Hence, the attached reverts everything 4ab5dae94 did to this function,
and most of 0e758ae89 too, and instead makes IsBinaryUpgrade an
additional reason to take the immediate-unlink path.

Barring objections, I'll push this after the release freeze lifts.

            regards, tom lane

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index b1a2cb575d..0f642df3c1 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -263,6 +263,12 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
  * The fact that temp rels and regular rels have different file naming
  * patterns provides additional safety.
  *
+ * We also don't do it while performing a binary upgrade.  There is no reuse
+ * hazard in that case, since after a crash or even a simple ERROR, the
+ * upgrade fails and the whole cluster must be recreated from scratch.
+ * Furthermore, it is important to remove the files from disk immediately,
+ * because we may be about to reuse the same relfilenumber.
+ *
  * All the above applies only to the relation's main fork; other forks can
  * just be removed immediately, since they are not needed to prevent the
  * relfilenumber from being recycled.  Also, we do not carefully
@@ -319,14 +325,15 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
 {
     char       *path;
     int            ret;
-    BlockNumber segno = 0;

     path = relpath(rlocator, forknum);

     /*
-     * Delete or truncate the first segment.
+     * Truncate and then unlink the first segment, or just register a request
+     * to unlink it later, as described in the comments for mdunlink().
      */
-    if (isRedo || forknum != MAIN_FORKNUM || RelFileLocatorBackendIsTemp(rlocator))
+    if (isRedo || IsBinaryUpgrade || forknum != MAIN_FORKNUM ||
+        RelFileLocatorBackendIsTemp(rlocator))
     {
         if (!RelFileLocatorBackendIsTemp(rlocator))
         {
@@ -351,7 +358,6 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
                 ereport(WARNING,
                         (errcode_for_file_access(),
                          errmsg("could not remove file \"%s\": %m", path)));
-            segno++;
         }
     }
     else
@@ -359,42 +365,25 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
         /* Prevent other backends' fds from holding on to the disk space */
         ret = do_truncate(path);

-        /*
-         * Except during a binary upgrade, register request to unlink first
-         * segment later, rather than now.
-         *
-         * If we're performing a binary upgrade, the dangers described in the
-         * header comments for mdunlink() do not exist, since after a crash or
-         * even a simple ERROR, the upgrade fails and the whole new cluster
-         * must be recreated from scratch. And, on the other hand, it is
-         * important to remove the files from disk immediately, because we may
-         * be about to reuse the same relfilenumber.
-         */
-        if (!IsBinaryUpgrade)
-        {
-            register_unlink_segment(rlocator, forknum, 0 /* first seg */ );
-            segno++;
-        }
+        /* Register request to unlink first segment later */
+        register_unlink_segment(rlocator, forknum, 0 /* first seg */ );
     }

     /*
-     * Delete any remaining segments (we might or might not have dealt with
-     * the first one above).
+     * Delete any additional segments.
      */
     if (ret >= 0)
     {
         char       *segpath = (char *) palloc(strlen(path) + 12);
+        BlockNumber segno;

         /*
          * Note that because we loop until getting ENOENT, we will correctly
          * remove all inactive segments as well as active ones.
          */
-        for (;; segno++)
+        for (segno = 1;; segno++)
         {
-            if (segno == 0)
-                strcpy(segpath, path);
-            else
-                sprintf(segpath, "%s.%u", path, segno);
+            sprintf(segpath, "%s.%u", path, segno);

             if (!RelFileLocatorBackendIsTemp(rlocator))
             {

pgsql-hackers by date:

Previous
From: David Christensen
Date:
Subject: Re: [PATCHES] Post-special page storage TDE support
Next
From: Peter Geoghegan
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum