Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table
Date
Msg-id 7142.1556148430@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List pgsql-bugs
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2019/04/24 7:03, Tom Lane wrote:
>> ISTM we could work around the problem with the attached, which I think
>> is a good change independently of anything else.

> Agreed.

After thinking about it more, it seems like a bad idea to put the check
in CheckIndexCompatible; that could interfere with any other use of the
function to match up potential child indexes.  (I wonder why this test
isn't the same as what DefineIndex uses to spot potential child indexes,
BTW --- it uses a completely separate function CompareIndexInfo, which
seems both wasteful and trouble waiting to happen.)

So I think we should test the relkind in TryReuseIndex, instead.
I think it would be a good idea to *also* do what you suggested
upthread and have DefineIndex clear the field when cloning an
IndexStmt to create a child; no good could possibly come of
passing that down when we intend to create a new index.

In short, I think the right short-term fix is as attached (plus
a new regression test case based on the submitted example).

Longer term, it's clearly bad that we fail to reuse child indexes
in this scenario; the point about mangled comments is minor by
comparison.  I'm inclined to think that what we want to do is
*not* recurse when creating the parent index, but to initially
make it NOT VALID, and then do ALTER ATTACH PARTITION with each
re-used child index.  This would successfully reproduce the
previous state in case only some of the partitions have attached
indexes, which I don't think works correctly right now.

BTW, I hadn't ever looked closely at what the index reuse code
does, and now that I have, my heart just sinks.  I think that
logic needs to be nuked from orbit.  RelationPreserveStorage was
never meant to be abused in this way; I invite you to contemplate
whether it's not a problem that it doesn't check the backend and
nestLevel fields of PendingRelDelete entries before killing them.
(In its originally-designed use for mapped rels at transaction end,
that wasn't a problem, but I'm afraid that it may be one now.)

The right way to do this IMO would be something closer to the
heap-swap logic in cluster.c, where we exchange the relfilenodes
of two live indexes, rather than what is happening now.  Or for
that matter, do we really need to delete the old indexes at all?

None of that looks very practical for v12, let alone back-patching
to v11, though.

            regards, tom lane

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a1c91b5..002a8b8 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1125,6 +1125,18 @@ DefineIndex(Oid relationId,
                     ListCell   *lc;

                     /*
+                     * We can't use the same index name for the child index,
+                     * so clear idxname to let the recursive invocation choose
+                     * a new name.  Likewise, the existing target relation
+                     * field is wrong, and if indexOid or oldNode are set,
+                     * they mustn't be applied to the child either.
+                     */
+                    childStmt->idxname = NULL;
+                    childStmt->relation = NULL;
+                    childStmt->indexOid = InvalidOid;
+                    childStmt->oldNode = InvalidOid;
+
+                    /*
                      * Adjust any Vars (both in expressions and in the index's
                      * WHERE clause) to match the partition's column numbering
                      * in case it's different from the parent's.
@@ -1155,8 +1167,6 @@ DefineIndex(Oid relationId,
                     if (found_whole_row)
                         elog(ERROR, "cannot convert whole-row table reference");

-                    childStmt->idxname = NULL;
-                    childStmt->relation = NULL;
                     DefineIndex(childRelid, childStmt,
                                 InvalidOid, /* no predefined OID */
                                 indexRelationId,    /* this is our child */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index aa7328e..66f863f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11454,7 +11454,9 @@ TryReuseIndex(Oid oldId, IndexStmt *stmt)
     {
         Relation    irel = index_open(oldId, NoLock);

-        stmt->oldNode = irel->rd_node.relNode;
+        /* If it's a partitioned index, there is no storage to share. */
+        if (irel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
+            stmt->oldNode = irel->rd_node.relNode;
         index_close(irel, NoLock);
     }
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f8ee4b0..4b7520b 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1001,6 +1001,19 @@ DefineIndex(Oid relationId,
                     ListCell   *lc;

                     /*
+                     * We can't use the same index name for the child index,
+                     * so clear idxname to let the recursive invocation choose
+                     * a new name.  Likewise, the existing target relation
+                     * field is wrong, and if indexOid or oldNode are set,
+                     * they mustn't be applied to the child either.
+                     */
+                    childStmt->idxname = NULL;
+                    childStmt->relation = NULL;
+                    childStmt->relationId = childRelid;
+                    childStmt->indexOid = InvalidOid;
+                    childStmt->oldNode = InvalidOid;
+
+                    /*
                      * Adjust any Vars (both in expressions and in the index's
                      * WHERE clause) to match the partition's column numbering
                      * in case it's different from the parent's.
@@ -1031,8 +1044,6 @@ DefineIndex(Oid relationId,
                     if (found_whole_row)
                         elog(ERROR, "cannot convert whole-row table reference");

-                    childStmt->idxname = NULL;
-                    childStmt->relationId = childRelid;
                     DefineIndex(childRelid, childStmt,
                                 InvalidOid, /* no predefined OID */
                                 indexRelationId,    /* this is our child */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 65ede33..865fc42 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10696,7 +10696,9 @@ TryReuseIndex(Oid oldId, IndexStmt *stmt)
     {
         Relation    irel = index_open(oldId, NoLock);

-        stmt->oldNode = irel->rd_node.relNode;
+        /* If it's a partitioned index, there is no storage to share. */
+        if (irel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
+            stmt->oldNode = irel->rd_node.relNode;
         index_close(irel, NoLock);
     }
 }

pgsql-bugs by date:

Previous
From: raf@raf.org
Date:
Subject: Re: bug: evil autoConcat when each string is on new line
Next
From: Michael Paquier
Date:
Subject: Re: BUG #15631: Generated as identity field in a temporary tablewith on commit drop corrupts system catalogs