Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation
Date
Msg-id 20200907.174036.603784943439284546.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> The cause is that the worker had received pending-sync entry correctly
> but not never created a relcache entry for the relation using
> RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
> set.
> 
> I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..d66e914345 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1322,10 +1322,27 @@ RelationInitPhysicalAddr(Relation relation)
     }
     else
     {
+        Oid oldnode = relation->rd_node.relNode;
+
         /* Consult the relation mapper */
         relation->rd_node.relNode =
             RelationMapOidToFilenode(relation->rd_id,
                                      relation->rd_rel->relisshared);
+
+        /*
+         * The leader process might have replaced storage before staring this
+         * worker. We need to set rd_firstRelfilenodeSubid according to pending
+         * sync hash in that case. Currently the information is not actually
+         * used in worker processes, but make things tidy.
+         */
+        if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+        {
+            if (RelFileNodeSkippingWAL(relation->rd_node))
+                relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+            else
+                relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+        }
+
         if (!OidIsValid(relation->rd_node.relNode))
             elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
                  RelationGetRelationName(relation), relation->rd_id);

pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Re: Yet another fast GiST build (typo)
Next
From: Michael Paquier
Date:
Subject: Re: Online checksums verification in the backend