Re: BUG #17756: Invalid replica indentity set order in a dump - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #17756: Invalid replica indentity set order in a dump
Date
Msg-id 615621.1674244465@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #17756: Invalid replica indentity set order in a dump  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I wrote:
> I wonder why it is that the backend rejects this sequence.
> I see that you can do this:
> regression=# create table tbl (id integer not null primary key) partition by list (id);
> CREATE TABLE
> regression=# alter table tbl replica identity using index tbl_pkey;
> ALTER TABLE
> and it doesn't seem like the partitioned index is notably more
> valid in this state than in the one that pg_dump has created.
> So I think it might be better to fix the backend to allow this
> sequence of operations.

I looked into this, and I think we basically could just drop the
restriction in ATExecReplicaIdentity that rejects marking an invalid
index as replica identity.  If it is invalid, then the relcache won't
treat it as being the live replica identity (cf RelationGetIndexList),
so nothing will happen until it becomes valid; but there's no reason
why we can't mark it as identity in advance of that.  We support
not-yet-valid primary keys, so why not replica identity?

In checking this, I found a comment in index.c claiming

             * ALTER TABLE assumes that indisreplident cannot be set for
             * invalid indexes.

which was added by Michael's 9511fb37a.  The back story for that
indicates that Michael was worried about multiple indexes of a
relation becoming marked with indisreplident.  However, as far
as I can see the only actual issue there is that
relation_mark_replica_identity is too trusting, and that looks
like either sloppy coding or premature optimization.  I think
we could just rip out its early-exit path and fix it to positively
guarantee that no more than one index is marked indisreplident.

In short, I propose the attached.  This'd need to be back-patched,
which 9511fb37a wasn't, so maybe we should also back-patch 9511fb37a?
I don't think it's really necessary though.

            regards, tom lane

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index e6579f2979..41b16cb89b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3482,9 +3482,8 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
              * CONCURRENTLY that failed partway through.)
              *
              * Note: the CLUSTER logic assumes that indisclustered cannot be
-             * set on any invalid index, so clear that flag too.  Similarly,
-             * ALTER TABLE assumes that indisreplident cannot be set for
-             * invalid indexes.
+             * set on any invalid index, so clear that flag too.  For
+             * cleanliness, also clear indisreplident.
              */
             indexForm->indisvalid = false;
             indexForm->indisclustered = false;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7c697a285b..1293545947 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15773,7 +15773,10 @@ ATExecDropOf(Relation rel, LOCKMODE lockmode)
  * relation_mark_replica_identity: Update a table's replica identity
  *
  * Iff ri_type = REPLICA_IDENTITY_INDEX, indexOid must be the Oid of a suitable
- * index. Otherwise, it should be InvalidOid.
+ * index. Otherwise, it must be InvalidOid.
+ *
+ * Caller had better hold an exclusive lock on the relation, as the results
+ * of running two of these concurrently wouldn't be pretty.
  */
 static void
 relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid,
@@ -15785,7 +15788,6 @@ relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid,
     HeapTuple    pg_index_tuple;
     Form_pg_class pg_class_form;
     Form_pg_index pg_index_form;
-
     ListCell   *index;

     /*
@@ -15807,29 +15809,7 @@ relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid,
     heap_freetuple(pg_class_tuple);

     /*
-     * Check whether the correct index is marked indisreplident; if so, we're
-     * done.
-     */
-    if (OidIsValid(indexOid))
-    {
-        Assert(ri_type == REPLICA_IDENTITY_INDEX);
-
-        pg_index_tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexOid));
-        if (!HeapTupleIsValid(pg_index_tuple))
-            elog(ERROR, "cache lookup failed for index %u", indexOid);
-        pg_index_form = (Form_pg_index) GETSTRUCT(pg_index_tuple);
-
-        if (pg_index_form->indisreplident)
-        {
-            ReleaseSysCache(pg_index_tuple);
-            return;
-        }
-        ReleaseSysCache(pg_index_tuple);
-    }
-
-    /*
-     * Clear the indisreplident flag from any index that had it previously,
-     * and set it for any index that should have it now.
+     * Update the per-index indisreplident flags correctly.
      */
     pg_index = table_open(IndexRelationId, RowExclusiveLock);
     foreach(index, RelationGetIndexList(rel))
@@ -15843,19 +15823,23 @@ relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid,
             elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
         pg_index_form = (Form_pg_index) GETSTRUCT(pg_index_tuple);

-        /*
-         * Unset the bit if set.  We know it's wrong because we checked this
-         * earlier.
-         */
-        if (pg_index_form->indisreplident)
+        if (thisIndexOid == indexOid)
         {
-            dirty = true;
-            pg_index_form->indisreplident = false;
+            /* Set the bit if not already set. */
+            if (!pg_index_form->indisreplident)
+            {
+                dirty = true;
+                pg_index_form->indisreplident = true;
+            }
         }
-        else if (thisIndexOid == indexOid)
+        else
         {
-            dirty = true;
-            pg_index_form->indisreplident = true;
+            /* Unset the bit if set. */
+            if (pg_index_form->indisreplident)
+            {
+                dirty = true;
+                pg_index_form->indisreplident = false;
+            }
         }

         if (dirty)
@@ -15867,7 +15851,9 @@ relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid,
             /*
              * Invalidate the relcache for the table, so that after we commit
              * all sessions will refresh the table's replica identity index
-             * before attempting any UPDATE or DELETE on the table.
+             * before attempting any UPDATE or DELETE on the table.  (If we
+             * changed the table's pg_class row above, then a relcache inval
+             * is already queued due to that; but we might not have.)
              */
             CacheInvalidateRelcache(rel);
         }
@@ -15952,12 +15938,6 @@ ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKMODE lockmode
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("cannot use partial index \"%s\" as replica identity",
                         RelationGetRelationName(indexRel))));
-    /* And neither are invalid indexes. */
-    if (!indexRel->rd_index->indisvalid)
-        ereport(ERROR,
-                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                 errmsg("cannot use invalid index \"%s\" as replica identity",
-                        RelationGetRelationName(indexRel))));

     /* Check index for nullable columns. */
     for (key = 0; key < IndexRelationGetNumberOfKeyAttributes(indexRel); key++)

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17756: Invalid replica indentity set order in a dump
Next
From: PG Bug reporting form
Date:
Subject: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes