Re: standby recovery fails (tablespace related) (tentative patch anddiscussion) - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: standby recovery fails (tablespace related) (tentative patch anddiscussion)
Date
Msg-id c3cf045b-f14b-d118-4b85-935159d214c3@oss.nttdata.com
Whole thread Raw
In response to Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Paul Guo <pguo@pivotal.io>)
Responses Re: standby recovery fails (tablespace related) (tentative patch anddiscussion)  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers

On 2020/01/15 19:18, Paul Guo wrote:
> I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch
series.Let's see the CI pipeline result.
 

Thanks for updating the patches!

I started reading the 0003 patch.

The approach that the 0003 patch uses is not the perfect solution.
If the standby crashes after tblspc_redo() removes the directory and before
its subsequent COMMIT record is replayed, PANIC error would occur since
there can be some unresolved missing directory entries when we reach the
consistent state. The problem would very rarely happen, though...
Just idea; calling XLogFlush() to update the minimum recovery point just
before tblspc_redo() performs destroy_tablespace_directories() may be
safe and helpful for the problem?

-        appendStringInfo(buf, "copy dir %u/%u to %u/%u",
-                         xlrec->src_tablespace_id, xlrec->src_db_id,
-                         xlrec->tablespace_id, xlrec->db_id);
+        dbpath1 = GetDatabasePath(xlrec->src_db_id,  xlrec->src_tablespace_id);
+        dbpath2 = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
+        appendStringInfo(buf, "copy dir %s to %s", dbpath1, dbpath2);
+        pfree(dbpath2);
+        pfree(dbpath1);

If the patch is for the bug fix and would be back-ported, the above change
would lead to change pg_waldump's output for CREATE/DROP DATABASE between
minor versions. IMO it's better to avoid such change and separate the above
as a separate patch only for master.

-            appendStringInfo(buf, " %u/%u",
-                             xlrec->tablespace_ids[i], xlrec->db_id);
+        {
+            dbpath1 = GetDatabasePath(xlrec->db_id, xlrec->tablespace_ids[i]);
+            appendStringInfo(buf,  "%s", dbpath1);
+            pfree(dbpath1);
+        }

Same as above.

BTW, the above "%s" should be " %s", i.e., a space character needs to be
appended to the head of "%s".

+            get_parent_directory(parent_path);
+            if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
+            {
+                XLogLogMissingDir(xlrec->tablespace_id, InvalidOid, dst_path);

The third argument of XLogLogMissingDir() should be parent_path instead of
dst_path?

+    if (hash_search(missing_dir_tab, &key, HASH_REMOVE, NULL) == NULL)
+        elog(DEBUG2, "dir %s tablespace %d database %d is not missing",
+             path, spcNode, dbNode);

I think that this elog() is useless and rather confusing.

+        XLogForgetMissingDir(xlrec->ts_id, InvalidOid, "");

The third argument should be set to the actual path instead of an empty
string. Otherwise XLogForgetMissingDir() may emit a confusing DEBUG2
message. Or the third argument of XLogForgetMissingDir() should be removed
and the path in the DEBUG2 message should be calculated from the spcNode
and dbNode in the hash entry in XLogForgetMissingDir().

+#include "common/file_perm.h"

This seems not necessary.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: BufFileRead() error signalling
Next
From: Robert Haas
Date:
Subject: Re: pg_croak, or something like it?