Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date
Msg-id 20220315.150926.894035582561069628.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
At Mon, 14 Mar 2022 17:37:40 -0400, Robert Haas <robertmhaas@gmail.com> wrote in 
> On Mon, Mar 7, 2022 at 3:39 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > So the new framework has been dropped in this version.
> > The second test is removed as it is irrelevant to this bug.
> >
> > In this version the patch is a single file that contains the test.
> 
> The status of this patch in the CommitFest was set to "Waiting for
> Author." Since a new patch has been submitted since that status was
> set, I have changed it to "Needs Review." Since this is now in its
> 15th CommitFest, we really should get it fixed; that's kind of
> ridiculous. (I am as much to blame as anyone.) It does seem to be a
> legitimate bug.
> 
> A few questions about the patch:

Thanks for looking this!

> 1. Why is it OK to just skip the operation without making it up later?

Does "it" mean removal of directories?  It is not okay, but in the
first place it is out-of-scope of this patch to fix that.  The patch
leaves the existing code alone.  This patch just has recovery ignore
invalid accesses into eventually removed objects.

Maybe, I don't understand you question..

> 2. Why not instead change the code so that the operation can succeed,
> by creating the prerequisite parent directories? Do we not have enough
> information for that? I'm not saying that we definitely should do it
> that way rather than this way, but I think we do take that approach in
> some cases.

It is proposed first by Paul Guo [1] then changed so that it ignores
failed directory creations in the very early stage in this thread.
After that, it gets conscious of recovery consistency by managing
invalid-access list.

[1]
https://www.postgresql.org/message-id/flat/20210327142316.GA32517%40alvherre.pgsql#a557bd47207a446ce206879676e0140a

I think there was no strong reason for the current shape but I
personally rather like the remembering-invalid-access way because it
doesn't dirty the data directory and it is consistent with how we
treat missing heap pages.

I tried a slightly tweaked version (attached) of the first version and
confirmed that it works for the current test script.  It doesn't check
recovery consistency but otherwise that way also seems fine.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index c37e3c9a9a..28aed8d296 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -47,6 +47,7 @@
 #include "commands/defrem.h"
 #include "commands/seclabel.h"
 #include "commands/tablespace.h"
+#include "common/file_perm.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -2382,6 +2383,7 @@ dbase_redo(XLogReaderState *record)
         xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) XLogRecGetData(record);
         char       *src_path;
         char       *dst_path;
+        char       *parent_path;
         struct stat st;
 
         src_path = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id);
@@ -2401,6 +2403,41 @@ dbase_redo(XLogReaderState *record)
                                 dst_path)));
         }
 
+        /*
+         * It is possible that the tablespace was later dropped, but we are
+         * re-redoing database create before that. In that case, those
+         * directories are gone, and we do not create symlink.
+         */
+        if (stat(dst_path, &st) < 0 && errno == ENOENT)
+        {
+            parent_path = pstrdup(dst_path);
+            get_parent_directory(parent_path);
+            elog(WARNING, "creating missing directory: %s", parent_path);
+            if (stat(parent_path, &st) != 0 && pg_mkdir_p(parent_path, pg_dir_create_mode) != 0)
+            {
+                ereport(WARNING,
+                        (errmsg("can not recursively create directory \"%s\"",
+                                parent_path)));
+            }
+        }
+
+        /*
+         * There's a case where the copy source directory is missing for the
+         * same reason above.  Create the emtpy source directory so that
+         * copydir below doesn't fail.  The directory will be dropped soon by
+         * recovery.
+         */
+        if (stat(src_path, &st) < 0 && errno == ENOENT)
+        {
+            elog(WARNING, "creating missing copy source directory: %s", src_path);
+            if (stat(src_path, &st) != 0 && pg_mkdir_p(src_path, pg_dir_create_mode) != 0)
+            {
+                ereport(WARNING,
+                        (errmsg("can not recursively create directory \"%s\"",
+                                src_path)));
+            }
+        }
+
         /*
          * Force dirty buffers out to disk, to ensure source database is
          * up-to-date for the copy.
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 40514ab550..675f578dfe 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -155,8 +155,6 @@ TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo)
                 /* Directory creation failed? */
                 if (MakePGDirectory(dir) < 0)
                 {
-                    char       *parentdir;
-
                     /* Failure other than not exists or not in WAL replay? */
                     if (errno != ENOENT || !isRedo)
                         ereport(ERROR,
@@ -169,32 +167,8 @@ TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo)
                      * continue by creating simple parent directories rather
                      * than a symlink.
                      */
-
-                    /* create two parents up if not exist */
-                    parentdir = pstrdup(dir);
-                    get_parent_directory(parentdir);
-                    get_parent_directory(parentdir);
-                    /* Can't create parent and it doesn't already exist? */
-                    if (MakePGDirectory(parentdir) < 0 && errno != EEXIST)
-                        ereport(ERROR,
-                                (errcode_for_file_access(),
-                                 errmsg("could not create directory \"%s\": %m",
-                                        parentdir)));
-                    pfree(parentdir);
-
-                    /* create one parent up if not exist */
-                    parentdir = pstrdup(dir);
-                    get_parent_directory(parentdir);
-                    /* Can't create parent and it doesn't already exist? */
-                    if (MakePGDirectory(parentdir) < 0 && errno != EEXIST)
-                        ereport(ERROR,
-                                (errcode_for_file_access(),
-                                 errmsg("could not create directory \"%s\": %m",
-                                        parentdir)));
-                    pfree(parentdir);
-
                     /* Create database directory */
-                    if (MakePGDirectory(dir) < 0)
+                    if (pg_mkdir_p(dir, pg_dir_create_mode) < 0)
                         ereport(ERROR,
                                 (errcode_for_file_access(),
                                  errmsg("could not create directory \"%s\": %m",

pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: BufferAlloc: don't take two simultaneous locks
Next
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side