Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Date
Msg-id 20170919.150430.252858141.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
List pgsql-hackers
At Fri, 15 Sep 2017 15:36:26 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<d88a9a64-e307-59b5-b4c3-8d7fc11cb59d@lab.ntt.co.jp>
> Hi.
> 
> On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
> > << the following is another topic >>
> > 
> >>>> BTW, in the partitioned table case, the parent is always locked first
> >>>> using an AccessExclusiveLock.  There are other considerations in that case
> >>>> such as needing to recreate the partition descriptor upon termination of
> >>>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> >>>
> >>> Apart from the degree of concurrency, if we keep parent->children
> >>> order of locking, such recreation does not seem to be
> >>> needed. Maybe I'm missing something.
> >>
> >> Sorry to have introduced that topic in this thread, but I will try to
> >> explain anyway why things are the way they are currently:
> >>
> >> Once a table is no longer a partition of the parent (detached or dropped),
> >> we must make sure that the next commands in the transaction don't see it
> >> as one.  That information is currently present in the relcache
> >> (rd_partdesc), which is used by a few callers, most notably the
> >> tuple-routing code.  Next commands must recreate the entry so that the
> >> correct thing happens based on the updated information.  More precisely,
> >> we must invalidate the current entry.  RelationClearRelation() will either
> >> delete the entry or rebuild it.  If it's being referenced somewhere, it
> >> will be rebuilt.  The place holding the reference may also be looking at
> >> the content of rd_partdesc, which we don't require them to make a copy of,
> >> so we must preserve its content while other fields of RelationData are
> >> being read anew from the catalog.  We don't have to preserve it if there
> >> has been any change (partition added/dropped), but to make such a change
> >> one would need to take a strong enough lock on the relation (parent).  We
> >> assume here that anyone who wants to reference rd_partdesc takes at least
> >> AccessShareLock lock on the relation, and anyone who wants to change its
> >> content must take a lock that will conflict with it, so
> >> AccessExclusiveLock.  Note that in all of this, we are only talking about
> >> one relation, that is the parent, so parent -> child ordering of taking
> >> locks may be irrelevant.
> > 
> > I think I understand this, anyway DropInherit and DropPartition
> > is different-but-at-the-same-level operations so surely needs
> > amendment for drop/detach cases. Is there already a solution? Or
> > reproducing steps?
> 
> Sorry, I think I forgot to reply to this.  Since you seem to have chosen
> the other solution (checking that child is still a child), maybe this
> reply is a bit too late, but anyway.

I choosed it at that time for the reason mentioned upthread, but
haven't decided which is better.

> DropInherit or NO INHERIT is seen primarily as changing a child table's
> (which is the target table of the command) property that it is no longer a
> child of the parent, so we lock the child table to block concurrent
> operations from considering it a child of parent anymore.  The fact that
> parent is locked after the child and with ShareUpdateExclusiveLock instead
> of AccessExclusiveLock, we observe this race condition when SELECTing from
> the parent.
> 
> DropPartition or DETACH PARTITION is seen primarily as changing the parent
> table's (which is the target table of the command) property that one of
> the partitions is removed, so we lock the parent.   Any concurrent
> operations that rely on the parent's relcache to get the partition list
> will wait for the session that is dropping the partition to finish, so
> that they get the fresh information from the relcache (or more importantly
> do not end up with information obtained from the relcache going invalid
> under them without notice).  Note that the lock on the partition/child is
> also present and it plays more or less the the same role as it does in the
> DropInherit case, but due to different order of locking, reported race
> condition does not occur between SELECT on partitioned table and
> DROP/DETACH PARTITION.

Thank you for the explanation. I understand that the difference
comes from which of parent and children has the information about
inheritance/partitioning. DROP child and ALTER child NO INHERITS
are (I think) the only two operations that intiated from children
side. The parent-locking patch results in different solutions for
similar problems.

However, parent locking on DROPped child requires a new index
InheritsRelidIndex to avoid full scan on pg_inherits, but the NO
INHERITS case doesn't. This might be a good reason for the
difference.

I noticed that DROP TABLE partition acquires lock on the parent
in a callback for RangeVarGetRelid(Extended). The same way seems
reasonable for the NO INHERIT case. As the result the patch
becomes far small and less invasive than the previous
one. (dropinh_lock_parent_v2.patch)

As a negative PoC, attacched dropchild_lock_parent_PoC.patch only
to show how the same method on DROP TABLE case looks.


> By the way, I will take a look at your patch when I come back from the
> vacation.  Meanwhile, I noticed that it needs another rebase after
> 0a480502b092 [1].

Great. Thanks.

> Thanks,
> Amit
> 
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 563bcda..b7c87b9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13175,7 +13175,28 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,        reltype =
((AlterObjectSchemaStmt*) stmt)->objectType;    else if (IsA(stmt, AlterTableStmt))
 
-        reltype = ((AlterTableStmt *) stmt)->relkind;
+    {
+        AlterTableStmt *alterstmt = (AlterTableStmt *)stmt;
+        ListCell *lc;
+
+        reltype = alterstmt->relkind;
+
+        foreach (lc, alterstmt->cmds)
+        {
+            AlterTableCmd *cmd = lfirst_node(AlterTableCmd, lc);
+            Assert(IsA(cmd, AlterTableCmd));
+
+            /*
+             * Though NO INHERIT doesn't modify the parent, concurrent
+             * expansion of inheritances will see a false child. We must
+             * acquire lock on the parent when dropping inheritance, but no
+             * need to ascend further.
+             */
+            if (cmd->subtype == AT_DropInherit)
+                RangeVarGetRelid((RangeVar *)cmd->def,
+                                 AccessExclusiveLock, false);
+        }
+    }    else    {        reltype = OBJECT_TABLE; /* placate compiler */
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 245a374..b83e248 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -126,19 +126,6 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)        {            /* Get the lock
tosynchronize against concurrent drop */            LockRelationOid(inhrelid, lockmode);
 
-
-            /*
-             * Now that we have the lock, double-check to see if the relation
-             * really exists or not.  If not, assume it was dropped while we
-             * waited to acquire lock, and ignore it.
-             */
-            if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(inhrelid)))
-            {
-                /* Release useless lock */
-                UnlockRelationOid(inhrelid, lockmode);
-                /* And ignore this relation */
-                continue;
-            }        }        list = lappend_oid(list, inhrelid);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b7c87b9..62c8860 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1120,10 +1120,10 @@ RemoveRelations(DropStmt *drop)}/*
- * Before acquiring a table lock, check whether we have sufficient rights.
- * In the case of DROP INDEX, also try to lock the table before the index.
- * Also, if the table to be dropped is a partition, we try to lock the parent
- * first.
+ * Before acquiring a table lock, check whether we have sufficient rights.  In
+ * the case of DROP INDEX, also try to lock the table before the index.  Also,
+ * if the table to be dropped is a partition or inheritance children, we try
+ * to lock the parent first. */static voidRangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid
oldRelOid,
@@ -1229,6 +1229,26 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,        if
(OidIsValid(state->partParentOid))           LockRelationOid(state->partParentOid, AccessExclusiveLock);    }
 
+
+    /* the same for all inheritance parents */
+    if (relOid != oldRelOid)
+    {
+        Relation    inhrel;
+        SysScanDesc    scan;
+        HeapTuple    inhtup;
+
+        inhrel = heap_open(InheritsRelationId, AccessShareLock);
+        scan = systable_beginscan(inhrel, InvalidOid, false, NULL, 0, NULL);
+        while (HeapTupleIsValid(inhtup = systable_getnext(scan)))
+        {
+            Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inhtup);
+
+            if (inh->inhrelid == relOid && OidIsValid(inh->inhparent))
+                LockRelationOid(inh->inhparent, AccessExclusiveLock);
+        }
+        systable_endscan(scan);
+        heap_close(inhrel, AccessShareLock);
+    }}/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Creating backup history files for backups taken from standbys
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] [PATCH] Improve geometric types