Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers

From Paul Guo
Subject Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date
Msg-id 29C02691-6F05-422F-AB13-2B004D60F4CF@vmware.com
Whole thread Raw
In response to Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: standby recovery fails (tablespace related) (tentative patch and discussion)
List pgsql-hackers
Thanks for the review, please see the replies below.

> On Jan 5, 2021, at 9:07 AM, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> 
> At Wed, 8 Jul 2020 12:56:44 +0000, Paul Guo <guopa@vmware.com> wrote in 
>> On 2020/01/15 19:18, Paul Guo wrote:
>> I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch
series.Let's see the CI pipeline result.
 
>> 
>> Thanks for updating the patches!
>> 
>> I started reading the 0003 patch.
>> 
>> The approach that the 0003 patch uses is not the perfect solution.
>> If the standby crashes after tblspc_redo() removes the directory and before
>> its subsequent COMMIT record is replayed, PANIC error would occur since
>> there can be some unresolved missing directory entries when we reach the
>> consistent state. The problem would very rarely happen, though...
>> Just idea; calling XLogFlush() to update the minimum recovery point just
>> before tblspc_redo() performs destroy_tablespace_directories() may be
>> safe and helpful for the problem?
> 
> It seems to me that what the current patch does is too complex.  What
> we need to do here is to remember every invalid operation then forget
> it when the prerequisite object is dropped.
> 
> When a table space is dropped before consistency is established, we
> don't need to care what has been performed inside the tablespace.  In
> this perspective, it is enough to remember tablespace ids when failed
> to do something inside it due to the absence of the tablespace and
> then forget it when we remove it.  We could remember individual
> database id to show them in error messages, but I'm not sure it's
> useful.  The reason log_invalid_page records block numbers is to allow
> the machinery handle partial table truncations, but this is not the
> case since dropping tablespace cannot leave some of containing
> databases.
> 
> As the result, we won't see an unresolved invalid operations in a
> dropped tablespace.
> 
> Am I missing something?

Yes, removing the database id from the hash key in the log/forget code should
be usually fine, but the previous code does stricter/safer checking.

Consider the scenario:

CREATE DATABASE newdb1 TEMPLATE template_db1;
CREATE DATABASE newdb2 TEMPLATE template_db2; <- in case the template_db2 database directory is missing abnormally
somehow.
DROP DATABASE template_db1;

The previous code could detect this but if we remove the database id in the code,
this bad scenario is skipped.

> 
> 
> dbase_redo:
> +      if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> +      {
> +        XLogRecordMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);
> 
> This means "record the belonging table space directory if it is not
> found OR it is not a directory". The former can be valid but the
> latter is unconditionally can not (I don't think we bother considering
> symlinks there).

Again this is a safer check, in the case the parent_path is a file for example somehow,
we should panic finally for the case and let the user checks and then does recovery again.

> 
> +    /*
> +     * Source directory may be missing.  E.g. the template database used
> +     * for creating this database may have been dropped, due to reasons
> +     * noted above.  Moving a database from one tablespace may also be a
> +     * partner in the crime.
> +     */
> +    if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)))
> +    {
> +      XLogLogMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path);
> 
> This is a part of *creation* of the target directory. Lack of the
> source directory cannot be valid even if the source directory is
> dropped afterwards in the WAL stream and we can allow that if the
> *target* tablespace is dropped afterwards. As the result, as I
> mentioned above, we don't need to record about the database directory.
> 
> By the way the name XLogLogMiss.. is somewhat confusing. How about
> XLogReportMissingDir (named after report_invalid_page).

Agree with you.

Also your words remind me that we should skip the checking if the consistency point
is reached.

Here is a git diff against the previous patch. I’ll send out the new rebased patches after
the consensus is reached.

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 7ade385965..c8fe3fe228 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -90,7 +90,7 @@ typedef struct xl_missing_dir
 static HTAB *missing_dir_tab = NULL;

 void
-XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path)
+XLogReportMissingDir(Oid spcNode, Oid dbNode, char *path)
 {
     xl_missing_dir_key key;
     bool found;
@@ -103,16 +103,6 @@ XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path)
      */
     Assert(OidIsValid(spcNode));

-    if (reachedConsistency)
-    {
-        if (dbNode == InvalidOid)
-            elog(PANIC, "cannot find directory %s (tablespace %d)",
-                 path, spcNode);
-        else
-            elog(PANIC, "cannot find directory %s (tablespace %d database %d)",
-                 path, spcNode, dbNode);
-    }
-
     if (missing_dir_tab == NULL)
     {
         /* create hash table when first needed */
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index fbff422c3b..7bd6d4efd9 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -2205,7 +2205,7 @@ dbase_redo(XLogReaderState *record)
                         (errmsg("some useless files may be left behind in old database directory \"%s\"",
                                 dst_path)));
         }
-        else
+        else if (!reachedConsistency)
         {
             /*
              * It is possible that drop tablespace record appearing later in
@@ -2221,7 +2221,7 @@ dbase_redo(XLogReaderState *record)
             get_parent_directory(parent_path);
             if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
             {
-                XLogLogMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);
+                XLogReportMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);
                 skip = true;
                 ereport(WARNING,
                         (errmsg("skipping create database WAL record"),
@@ -2239,9 +2239,10 @@ dbase_redo(XLogReaderState *record)
          * noted above.  Moving a database from one tablespace may also be a
          * partner in the crime.
          */
-        if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)))
+        if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)) &&
+            !reachedConsistency)
         {
-            XLogLogMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path);
+            XLogReportMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path);
             skip = true;
             ereport(WARNING,
                     (errmsg("skipping create database WAL record"),
@@ -2311,7 +2312,8 @@ dbase_redo(XLogReaderState *record)
                         (errmsg("some useless files may be left behind in old database directory \"%s\"",
                                 dst_path)));

-            XLogForgetMissingDir(xlrec->tablespace_ids[i], xlrec->db_id);
+            if (!reachedConsistency)
+                XLogForgetMissingDir(xlrec->tablespace_ids[i], xlrec->db_id);

             pfree(dst_path);
         }
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 294c9676b4..15eaa757cc 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1534,7 +1534,8 @@ tblspc_redo(XLogReaderState *record)
     {
         xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);

-        XLogForgetMissingDir(xlrec->ts_id, InvalidOid);
+        if (!reachedConsistency)
+            XLogForgetMissingDir(xlrec->ts_id, InvalidOid);

         XLogFlush(record->EndRecPtr);

diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index da561af5ab..6561d9cebe 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -23,7 +23,7 @@ extern void XLogDropDatabase(Oid dbid);
 extern void XLogTruncateRelation(RelFileNode rnode, ForkNumber forkNum,
                                  BlockNumber nblocks);

-extern void XLogLogMissingDir(Oid spcNode, Oid dbNode, char *path);
+extern void XLogReportMissingDir(Oid spcNode, Oid dbNode, char *path);
 extern void XLogForgetMissingDir(Oid spcNode, Oid dbNode);
 extern void XLogCheckMissingDirs(void);

diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl
index 748200ebb5..95eb6d26cc 100644
--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -141,7 +141,7 @@ $node_master->wait_for_catchup($node_standby, 'replay',
 $node_standby->safe_psql('postgres', 'CHECKPOINT');

 # Do immediate shutdown just after a sequence of CREAT DATABASE / DROP
-# DATABASE / DROP TABLESPACE. This causes CREATE DATBASE WAL records
+# DATABASE / DROP TABLESPACE. This causes CREATE DATABASE WAL records


pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: Is Recovery actually paused?
Next
From: Dilip Kumar
Date:
Subject: Re: Is Recovery actually paused?