Thread: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

[HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Kyotaro HORIGUCHI
Date:
Hello.

I had a case of unexpected error caused by ALTER TABLE NO
INHERIT.

=# CREATE TABLE p (a int);
=# CREATE TABLE c1 () INHERITS (p);

session A=# BEGIN;
session A=# ALTER TABLE c1 NO INHERIT p;

session B=# EXPLAIN ANALYZE SELECT * FROM p;
(blocked)

session A=# COMMIT;

session B: ERROR:  could not find inherited attribute "a" of relation "c1"

This happens at least back to 9.1 to master and doesn't seem to
be a designed behavior.

The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.

I see two ways to fix this.

The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.


The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.

Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.


Any comments or thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index e5fb52c..1670d11 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -42,6 +42,8 @@ typedef struct SeenRelsEntry    ListCell   *numparents_cell;    /* corresponding list cell */}
SeenRelsEntry;
+static bool is_descendent_of_internal(Oid parentId, Oid childId,
+                                      HTAB *seen_rels);/* * find_inheritance_children *
@@ -376,3 +378,71 @@ typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)    return result;}
+
+/*
+ * Check if the child is really a descendent of the parent
+ */
+bool
+is_descendent_of(Oid parentId, Oid childId)
+{
+    HTAB       *seen_rels;
+    HASHCTL        ctl;
+    bool        ischild = false;
+
+    memset(&ctl, 0, sizeof(ctl));
+    ctl.keysize = sizeof(Oid);
+    ctl.entrysize = sizeof(Oid);
+    ctl.hcxt = CurrentMemoryContext;
+
+    seen_rels = hash_create("is_descendent_of temporary table",
+                            32, /* start small and extend */
+                            &ctl,
+                            HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+    ischild = is_descendent_of_internal(parentId, childId, seen_rels);
+
+    hash_destroy(seen_rels);
+
+    return ischild;
+}
+
+static bool
+is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels)
+{
+    Relation    inhrel;
+    SysScanDesc scan;
+    ScanKeyData key[1];
+    bool        ischild = false;
+    HeapTuple    inheritsTuple;
+
+    inhrel = heap_open(InheritsRelationId, AccessShareLock);
+    ScanKeyInit(&key[0], Anum_pg_inherits_inhparent,
+                BTEqualStrategyNumber, F_OIDEQ,    ObjectIdGetDatum(parentId));
+    scan = systable_beginscan(inhrel, InheritsParentIndexId, true,
+                              NULL, 1, key);
+
+    while ((inheritsTuple = systable_getnext(scan)) != NULL)
+    {
+        bool found;
+        Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+
+        hash_search(seen_rels, &inhrelid, HASH_ENTER, &found);
+
+        /*
+         * Recursively check into children. Although there can't theoretically
+         * be any cycles in the inheritance graph, check the cycles following
+         * find_all_inheritors.
+         */
+        if (inhrelid == childId ||
+            (!found && is_descendent_of_internal(inhrelid, childId, seen_rels)))
+        {
+            ischild = true;
+            break;
+        }
+    }
+
+    systable_endscan(scan);
+    heap_close(inhrel, AccessShareLock);
+
+    return ischild;
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index cf46b74..f1c582a 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -99,10 +99,9 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,static List
*generate_setop_grouplist(SetOperationStmt*op, List *targetlist);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);
 
-static void make_inh_translation_list(Relation oldrelation,
+static List *make_inh_translation_list(Relation oldrelation,                          Relation newrelation,
-                          Index newvarno,
-                          List **translated_vars);
+                          Index newvarno);static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
          List *translated_vars);static Node *adjust_appendrel_attrs_mutator(Node *node,
 
@@ -1502,14 +1501,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)         */        if
(childrte->relkind!= RELKIND_PARTITIONED_TABLE)        {
 
+            List *translated_vars =
+                make_inh_translation_list(oldrelation, newrelation,
+                                          childRTindex);
+
+            if (!translated_vars)
+            {
+                /*
+                 * This childrel is no longer a child of the parent.
+                 * Close the child relation releasing locks.
+                 */
+                if (childOID != parentOID)
+                    heap_close(newrelation, lockmode);
+                continue;
+            }
+            need_append = true;            appinfo = makeNode(AppendRelInfo);            appinfo->parent_relid = rti;
         appinfo->child_relid = childRTindex;            appinfo->parent_reltype = oldrelation->rd_rel->reltype;
   appinfo->child_reltype = newrelation->rd_rel->reltype;
 
-            make_inh_translation_list(oldrelation, newrelation, childRTindex,
-                                      &appinfo->translated_vars);
+            appinfo->translated_vars = translated_vars;            appinfo->parent_reloid = parentOID;
appinfos= lappend(appinfos, appinfo);
 
@@ -1614,14 +1627,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)/* *
make_inh_translation_list*      Build the list of translations from parent Vars to child Vars for
 
- *      an inheritance child.
+ *      an inheritance child. Returns NULL if the two relations are no longer in
+ *      a inheritance relationship. * * For paranoia's sake, we match type/collation as well as attribute name. */
-static void
+static List *make_inh_translation_list(Relation oldrelation, Relation newrelation,
-                          Index newvarno,
-                          List **translated_vars)
+                          Index newvarno){    List       *vars = NIL;    TupleDesc    old_tupdesc =
RelationGetDescr(oldrelation);
@@ -1691,8 +1704,18 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,                    break;
         }            if (new_attno >= newnatts)
 
+            {
+                /*
+                 * It's possible that newrelation is no longer a child of the
+                 * newrelation. We just ingore it for the case.
+                 */
+                if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id))
+                    return NULL;
+
+                /* If still a child, something's wrong. */                elog(ERROR, "could not find inherited
attribute\"%s\" of relation \"%s\"",                     attname, RelationGetRelationName(newrelation));
 
+            }        }        /* Found it, check type and collation match */
@@ -1711,7 +1734,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
       0));    }
 
-    *translated_vars = vars;
+    return vars;}/*
diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h
index abfa476..3affe40 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -22,5 +22,6 @@ extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,                    List
**parents);externbool has_subclass(Oid relationId);extern bool typeInheritsFrom(Oid subclassTypeId, Oid
superclassTypeId);
+extern bool is_descendent_of(Oid parentId, Oid childId);#endif                            /* PG_INHERITS_FN_H */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7d9c769..f69631f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3397,6 +3397,30 @@ AlterTableGetLockLevel(List *cmds)}/*
+ * Some AT commands require to take a lock on the parent prior to the target.
+ */
+void
+AlterTableLockInhParent(List *cmds, LOCKMODE lockmode)
+{
+    ListCell   *lcmd;
+
+    foreach(lcmd, cmds)
+    {
+        AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd);
+
+        switch (cmd->subtype)
+        {        
+        case AT_DropInherit:
+            RangeVarGetRelid((RangeVar *)cmd->def, lockmode, false);
+            break;
+
+        default:
+            break;
+        }
+    }
+}
+
+/* * ATController provides top level control over the phases. * * parsetree is passed in to allow it to be passed to
eventtriggers
 
@@ -11446,12 +11470,8 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode)
(errcode(ERRCODE_WRONG_OBJECT_TYPE),                errmsg("cannot change inheritance of a partition")));
 
-    /*
-     * AccessShareLock on the parent is probably enough, seeing that DROP
-     * TABLE doesn't lock parent tables at all.  We need some lock since we'll
-     * be inspecting the parent's schema.
-     */
-    parent_rel = heap_openrv(parent, AccessShareLock);
+    /* Requreid lock is already held */
+    parent_rel = heap_openrv(parent, NoLock);    /*     * We don't bother to check ownership of the parent table ---
ownershipof
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index ddacac8..ce3406b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1102,6 +1102,9 @@ ProcessUtilitySlow(ParseState *pstate,                     * permissions.                     */
                 lockmode = AlterTableGetLockLevel(atstmt->cmds);
 
+
+                    /* Lock the parent first if required */
+                    AlterTableLockInhParent(atstmt->cmds, lockmode);                    relid =
AlterTableLookupRelation(atstmt,lockmode);                    if (OidIsValid(relid))
 
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index abd31b6..c643326 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -27,6 +27,7 @@ extern ObjectAddress DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,extern void
RemoveRelations(DropStmt*drop);
 
+void AlterTableLockInhParent(List *cmds, LOCKMODE lockmode);extern Oid    AlterTableLookupRelation(AlterTableStmt
*stmt,LOCKMODE lockmode);extern void AlterTable(Oid relid, LOCKMODE lockmode, AlterTableStmt *stmt); 

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Amit Langote
Date:
Hello,

On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> I had a case of unexpected error caused by ALTER TABLE NO
> INHERIT.
> 
> =# CREATE TABLE p (a int);
> =# CREATE TABLE c1 () INHERITS (p);
> 
> session A=# BEGIN;
> session A=# ALTER TABLE c1 NO INHERIT p;
> 
> session B=# EXPLAIN ANALYZE SELECT * FROM p;
> (blocked)
> 
> session A=# COMMIT;
> 
> session B: ERROR:  could not find inherited attribute "a" of relation "c1"
> 
> This happens at least back to 9.1 to master and doesn't seem to
> be a designed behavior.

I recall I had proposed a fix for the same thing some time ago [1].

> The cause is that NO INHERIT doesn't take an exlusive lock on the
> parent. This allows expand_inherited_rtentry to add the child
> relation into appendrel after removal from the inheritance but
> still exists.

Right.

> I see two ways to fix this.
> 
> The first patch adds a recheck of inheritance relationship if the
> corresponding attribute is missing in the child in
> make_inh_translation_list(). The recheck is a bit complex but it
> is not performed unless the sequence above is happen. It checks
> duplication of relid (or cycles in inheritance) following
> find_all_inheritors (but doing a bit different) but I'm not sure
> it is really useful.

I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
I guess your hash table based solution will do the job as far as
performance of this check is concerned, although I haven't checked the
code closely.

> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> the parent first.
> 
> Since the latter has a larger impact on the current behavior and
> we already treat "DROP TABLE child" case in the similar way, I
> suppose that the first approach would be preferable.

That makes sense.

BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock.  There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).

Thanks,
Amit

[1] https://www.postgresql.org/message-id/565EB1F2.7000201%40lab.ntt.co.jp




Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Kyotaro HORIGUCHI
Date:
Hello,

At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<7f5fe522-f328-3c40-565f-5e1ce37455d1@lab.ntt.co.jp>
> Hello,
> 
> On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote:
> > Hello.
> > 
> > I had a case of unexpected error caused by ALTER TABLE NO
> > INHERIT.
> > 
> > =# CREATE TABLE p (a int);
> > =# CREATE TABLE c1 () INHERITS (p);
> > 
> > session A=# BEGIN;
> > session A=# ALTER TABLE c1 NO INHERIT p;
> > 
> > session B=# EXPLAIN ANALYZE SELECT * FROM p;
> > (blocked)
> > 
> > session A=# COMMIT;
> > 
> > session B: ERROR:  could not find inherited attribute "a" of relation "c1"
> > 
> > This happens at least back to 9.1 to master and doesn't seem to
> > be a designed behavior.
> 
> I recall I had proposed a fix for the same thing some time ago [1].

Wow. About 1.5 years ago and stuck? I have a concrete case where
this harms  so I'd like to fix that this time. How can we move on?

> > The cause is that NO INHERIT doesn't take an exlusive lock on the
> > parent. This allows expand_inherited_rtentry to add the child
> > relation into appendrel after removal from the inheritance but
> > still exists.
> 
> Right.
> 
> > I see two ways to fix this.
> > 
> > The first patch adds a recheck of inheritance relationship if the
> > corresponding attribute is missing in the child in
> > make_inh_translation_list(). The recheck is a bit complex but it
> > is not performed unless the sequence above is happen. It checks
> > duplication of relid (or cycles in inheritance) following
> > find_all_inheritors (but doing a bit different) but I'm not sure
> > it is really useful.
> 
> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> I guess your hash table based solution will do the job as far as
> performance of this check is concerned, although I haven't checked the
> code closely.

The hash table is not crucial in the patch. Substantially it just
recurs down looking up pg_inherits for the child. I considered
the new index but abandoned because I thought that such case
won't occur so frequently.

> > The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> > the parent first.
> > 
> > Since the latter has a larger impact on the current behavior and
> > we already treat "DROP TABLE child" case in the similar way, I
> > suppose that the first approach would be preferable.
> 
> That makes sense.
> 
> BTW, in the partitioned table case, the parent is always locked first
> using an AccessExclusiveLock.  There are other considerations in that case
> such as needing to recreate the partition descriptor upon termination of
> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).

Apart from the degree of concurrency, if we keep parent->children
order of locking, such recreation does not seem to be
needed. Maybe I'm missing something.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Amit Langote
Date:
On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
>>
>> I recall I had proposed a fix for the same thing some time ago [1].
> 
> Wow. About 1.5 years ago and stuck? I have a concrete case where
> this harms  so I'd like to fix that this time. How can we move on?

Agreed that this should be fixed.

Your proposed approach #1 to recheck the inheritance after obtaining the
lock on the child table seems to be a good way forward.

Approach #2 of reordering locking is a simpler solution, but does not
completely prevent the problem, because DROP TABLE child can still cause
it to occur, as you mentioned.

>>> The cause is that NO INHERIT doesn't take an exlusive lock on the
>>> parent. This allows expand_inherited_rtentry to add the child
>>> relation into appendrel after removal from the inheritance but
>>> still exists.
>>
>> Right.
>>
>>> I see two ways to fix this.
>>>
>>> The first patch adds a recheck of inheritance relationship if the
>>> corresponding attribute is missing in the child in
>>> make_inh_translation_list(). The recheck is a bit complex but it
>>> is not performed unless the sequence above is happen. It checks
>>> duplication of relid (or cycles in inheritance) following
>>> find_all_inheritors (but doing a bit different) but I'm not sure
>>> it is really useful.
>>
>> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
>> I guess your hash table based solution will do the job as far as
>> performance of this check is concerned, although I haven't checked the
>> code closely.
> 
> The hash table is not crucial in the patch. Substantially it just
> recurs down looking up pg_inherits for the child. I considered
> the new index but abandoned because I thought that such case
> won't occur so frequently.

Agreed.  BTW, the hash table in patch #1 does not seem to be really
helpful.  In the while loop in is_descendant_of_internal(), does
hash_search() ever return found = true?  AFAICS, it does not.

>>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
>>> the parent first.
>>>
>>> Since the latter has a larger impact on the current behavior and
>>> we already treat "DROP TABLE child" case in the similar way, I
>>> suppose that the first approach would be preferable.
>>
>> That makes sense.
>>
>> BTW, in the partitioned table case, the parent is always locked first
>> using an AccessExclusiveLock.  There are other considerations in that case
>> such as needing to recreate the partition descriptor upon termination of
>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> 
> Apart from the degree of concurrency, if we keep parent->children
> order of locking, such recreation does not seem to be
> needed. Maybe I'm missing something.

Sorry to have introduced that topic in this thread, but I will try to
explain anyway why things are the way they are currently:

Once a table is no longer a partition of the parent (detached or dropped),
we must make sure that the next commands in the transaction don't see it
as one.  That information is currently present in the relcache
(rd_partdesc), which is used by a few callers, most notably the
tuple-routing code.  Next commands must recreate the entry so that the
correct thing happens based on the updated information.  More precisely,
we must invalidate the current entry.  RelationClearRelation() will either
delete the entry or rebuild it.  If it's being referenced somewhere, it
will be rebuilt.  The place holding the reference may also be looking at
the content of rd_partdesc, which we don't require them to make a copy of,
so we must preserve its content while other fields of RelationData are
being read anew from the catalog.  We don't have to preserve it if there
has been any change (partition added/dropped), but to make such a change
one would need to take a strong enough lock on the relation (parent).  We
assume here that anyone who wants to reference rd_partdesc takes at least
AccessShareLock lock on the relation, and anyone who wants to change its
content must take a lock that will conflict with it, so
AccessExclusiveLock.  Note that in all of this, we are only talking about
one relation, that is the parent, so parent -> child ordering of taking
locks may be irrelevant.

Thanks,
Amit




Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Kyotaro HORIGUCHI
Date:
Hello,

I'll add this to CF2017-09.

At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<75fe42df-b1d8-89ff-596d-d9da0749e66d@lab.ntt.co.jp>
> On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
> >>
> >> I recall I had proposed a fix for the same thing some time ago [1].
> > 
> > Wow. About 1.5 years ago and stuck? I have a concrete case where
> > this harms  so I'd like to fix that this time. How can we move on?
> 
> Agreed that this should be fixed.
> 
> Your proposed approach #1 to recheck the inheritance after obtaining the
> lock on the child table seems to be a good way forward.
> 
> Approach #2 of reordering locking is a simpler solution, but does not
> completely prevent the problem, because DROP TABLE child can still cause
> it to occur, as you mentioned.
> 
> >>> The cause is that NO INHERIT doesn't take an exlusive lock on the
> >>> parent. This allows expand_inherited_rtentry to add the child
> >>> relation into appendrel after removal from the inheritance but
> >>> still exists.
> >>
> >> Right.
> >>
> >>> I see two ways to fix this.
> >>>
> >>> The first patch adds a recheck of inheritance relationship if the
> >>> corresponding attribute is missing in the child in
> >>> make_inh_translation_list(). The recheck is a bit complex but it
> >>> is not performed unless the sequence above is happen. It checks
> >>> duplication of relid (or cycles in inheritance) following
> >>> find_all_inheritors (but doing a bit different) but I'm not sure
> >>> it is really useful.
> >>
> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> >> I guess your hash table based solution will do the job as far as
> >> performance of this check is concerned, although I haven't checked the
> >> code closely.
> > 
> > The hash table is not crucial in the patch. Substantially it just
> > recurs down looking up pg_inherits for the child. I considered
> > the new index but abandoned because I thought that such case
> > won't occur so frequently.
> 
> Agreed.  BTW, the hash table in patch #1 does not seem to be really
> helpful.  In the while loop in is_descendant_of_internal(), does
> hash_search() ever return found = true?  AFAICS, it does not.
> 
> >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> >>> the parent first.
> >>>
> >>> Since the latter has a larger impact on the current behavior and
> >>> we already treat "DROP TABLE child" case in the similar way, I
> >>> suppose that the first approach would be preferable.
> >>
> >> That makes sense.

So, I attached only the first patch, rebased on the current
master (It actually failed to apply on it.) and fixed a typo in a
comment.

This still runs a closed-loop test using temporary hash but it
seems a bit paranoic. (this is the same check with what
find_all_inheritors is doing)


<< the following is another topic >>

> >> BTW, in the partitioned table case, the parent is always locked first
> >> using an AccessExclusiveLock.  There are other considerations in that case
> >> such as needing to recreate the partition descriptor upon termination of
> >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> > 
> > Apart from the degree of concurrency, if we keep parent->children
> > order of locking, such recreation does not seem to be
> > needed. Maybe I'm missing something.
> 
> Sorry to have introduced that topic in this thread, but I will try to
> explain anyway why things are the way they are currently:
> 
> Once a table is no longer a partition of the parent (detached or dropped),
> we must make sure that the next commands in the transaction don't see it
> as one.  That information is currently present in the relcache
> (rd_partdesc), which is used by a few callers, most notably the
> tuple-routing code.  Next commands must recreate the entry so that the
> correct thing happens based on the updated information.  More precisely,
> we must invalidate the current entry.  RelationClearRelation() will either
> delete the entry or rebuild it.  If it's being referenced somewhere, it
> will be rebuilt.  The place holding the reference may also be looking at
> the content of rd_partdesc, which we don't require them to make a copy of,
> so we must preserve its content while other fields of RelationData are
> being read anew from the catalog.  We don't have to preserve it if there
> has been any change (partition added/dropped), but to make such a change
> one would need to take a strong enough lock on the relation (parent).  We
> assume here that anyone who wants to reference rd_partdesc takes at least
> AccessShareLock lock on the relation, and anyone who wants to change its
> content must take a lock that will conflict with it, so
> AccessExclusiveLock.  Note that in all of this, we are only talking about
> one relation, that is the parent, so parent -> child ordering of taking
> locks may be irrelevant.

I think I understand this, anyway DropInherit and DropPartition
is different-but-at-the-same-level operations so surely needs
amendment for drop/detach cases. Is there already a solution? Or
reproducing steps?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/catalog/pg_inherits.c
--- b/src/backend/catalog/pg_inherits.c
***************
*** 42,47 **** typedef struct SeenRelsEntry
--- 42,49 ----     ListCell   *numparents_cell;    /* corresponding list cell */ } SeenRelsEntry; 
+ static bool is_descendent_of_internal(Oid parentId, Oid childId,
+                                       HTAB *seen_rels); /*  * find_inheritance_children  *
***************
*** 400,402 **** typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
--- 402,472 ----      return result; }
+ 
+ /*
+  * Check if the child is really a descendent of the parent
+  */
+ bool
+ is_descendent_of(Oid parentId, Oid childId)
+ {
+     HTAB       *seen_rels;
+     HASHCTL        ctl;
+     bool        ischild = false;
+ 
+     memset(&ctl, 0, sizeof(ctl));
+     ctl.keysize = sizeof(Oid);
+     ctl.entrysize = sizeof(Oid);
+     ctl.hcxt = CurrentMemoryContext;
+ 
+     seen_rels = hash_create("is_descendent_of temporary table",
+                             32, /* start small and extend */
+                             &ctl,
+                             HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ 
+     ischild = is_descendent_of_internal(parentId, childId, seen_rels);
+ 
+     hash_destroy(seen_rels);
+ 
+     return ischild;
+ }
+ 
+ static bool
+ is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels)
+ {
+     Relation    inhrel;
+     SysScanDesc scan;
+     ScanKeyData key[1];
+     bool        ischild = false;
+     HeapTuple    inheritsTuple;
+ 
+     inhrel = heap_open(InheritsRelationId, AccessShareLock);
+     ScanKeyInit(&key[0], Anum_pg_inherits_inhparent,
+                 BTEqualStrategyNumber, F_OIDEQ,    ObjectIdGetDatum(parentId));
+     scan = systable_beginscan(inhrel, InheritsParentIndexId, true,
+                               NULL, 1, key);
+ 
+     while ((inheritsTuple = systable_getnext(scan)) != NULL)
+     {
+         bool found;
+         Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+ 
+         hash_search(seen_rels, &inhrelid, HASH_ENTER, &found);
+ 
+         /*
+          * Recursively check into children. Although there can't theoretically
+          * be any cycles in the inheritance graph, check the cycles following
+          * find_all_inheritors.
+          */
+         if (inhrelid == childId ||
+             (!found && is_descendent_of_internal(inhrelid, childId, seen_rels)))
+         {
+             ischild = true;
+             break;
+         }
+     }
+ 
+     systable_endscan(scan);
+     heap_close(inhrel, AccessShareLock);
+ 
+     return ischild;
+ }
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***************
*** 100,109 **** static List *generate_append_tlist(List *colTypes, List *colCollations, static List
*generate_setop_grouplist(SetOperationStmt*op, List *targetlist); static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                          Index rti);
 
! static void make_inh_translation_list(Relation oldrelation,                           Relation newrelation,
!                           Index newvarno,
!                           List **translated_vars); static Bitmapset *translate_col_privs(const Bitmapset
*parent_privs,                    List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node,
 
--- 100,108 ---- static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); static void
expand_inherited_rtentry(PlannerInfo*root, RangeTblEntry *rte,                          Index rti);
 
! static List *make_inh_translation_list(Relation oldrelation,                           Relation newrelation,
!                           Index newvarno); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
             List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node,
 
***************
*** 1508,1513 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
--- 1507,1527 ----          */         if (childrte->relkind != RELKIND_PARTITIONED_TABLE)         {
+             List *translated_vars =
+                 make_inh_translation_list(oldrelation, newrelation,
+                                           childRTindex);
+ 
+             if (!translated_vars)
+             {
+                 /*
+                  * This childrel is no longer a child of the parent.
+                  * Close the child relation releasing locks.
+                  */
+                 if (childOID != parentOID)
+                     heap_close(newrelation, lockmode);
+                 continue;
+             }
+              /* Remember if we saw a real child. */             if (childOID != parentOID)                 has_child
=true;
 
***************
*** 1517,1524 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
appinfo->child_relid= childRTindex;             appinfo->parent_reltype = oldrelation->rd_rel->reltype;
appinfo->child_reltype= newrelation->rd_rel->reltype;
 
!             make_inh_translation_list(oldrelation, newrelation, childRTindex,
!                                       &appinfo->translated_vars);             appinfo->parent_reloid = parentOID;
       appinfos = lappend(appinfos, appinfo); 
 
--- 1531,1537 ----             appinfo->child_relid = childRTindex;             appinfo->parent_reltype =
oldrelation->rd_rel->reltype;            appinfo->child_reltype = newrelation->rd_rel->reltype;
 
!             appinfo->translated_vars = translated_vars;             appinfo->parent_reloid = parentOID;
appinfos= lappend(appinfos, appinfo); 
 
***************
*** 1623,1636 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) /*  *
make_inh_translation_list *      Build the list of translations from parent Vars to child Vars for
 
!  *      an inheritance child.  *  * For paranoia's sake, we match type/collation as well as attribute name.  */
! static void make_inh_translation_list(Relation oldrelation, Relation newrelation,
!                           Index newvarno,
!                           List **translated_vars) {     List       *vars = NIL;     TupleDesc    old_tupdesc =
RelationGetDescr(oldrelation);
--- 1636,1649 ---- /*  * make_inh_translation_list  *      Build the list of translations from parent Vars to child
Varsfor
 
!  *      an inheritance child. Returns NULL if the two relations are no longer in
!  *      a inheritance relationship.  *  * For paranoia's sake, we match type/collation as well as attribute name.
*/
! static List * make_inh_translation_list(Relation oldrelation, Relation newrelation,
!                           Index newvarno) {     List       *vars = NIL;     TupleDesc    old_tupdesc =
RelationGetDescr(oldrelation);
***************
*** 1700,1707 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
--- 1713,1730 ----                     break;             }             if (new_attno >= newnatts)
+             {
+                 /*
+                  * It's possible that newrelation is no longer a child of the
+                  * oldrelation. We just ingore it for the case.
+                  */
+                 if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id))
+                     return NULL;
+ 
+                 /* If still a child, something's wrong. */                 elog(ERROR, "could not find inherited
attribute\"%s\" of relation \"%s\"",                      attname, RelationGetRelationName(newrelation));
 
+             }         }          /* Found it, check type and collation match */
***************
*** 1720,1726 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
     0));     } 
 
!     *translated_vars = vars; }  /*
--- 1743,1749 ----                                      0));     } 
!     return vars; }  /*
*** a/src/include/catalog/pg_inherits_fn.h
--- b/src/include/catalog/pg_inherits_fn.h
***************
*** 23,27 **** extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
--- 23,28 ---- extern bool has_subclass(Oid relationId); extern bool has_superclass(Oid relationId); extern bool
typeInheritsFrom(OidsubclassTypeId, Oid superclassTypeId);
 
+ extern bool is_descendent_of(Oid parentId, Oid childId);  #endif                            /* PG_INHERITS_FN_H */

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Kyotaro HORIGUCHI
Date:
At Mon, 28 Aug 2017 18:28:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20170828.182807.98097766.horiguchi.kyotaro@lab.ntt.co.jp>
> I'll add this to CF2017-09.

This patch got deadly atack from the commit 30833ba. I changed
the signature of expand_single_inheritance_child in addition to
make_inh_translation_list to notify that the specified child is
no longer a child of the parent.

This passes regular regression test and fixed the the problem.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/catalog/pg_inherits.c
--- b/src/backend/catalog/pg_inherits.c
***************
*** 42,47 **** typedef struct SeenRelsEntry
--- 42,49 ----     ListCell   *numparents_cell;    /* corresponding list cell */ } SeenRelsEntry; 
+ static bool is_descendent_of_internal(Oid parentId, Oid childId,
+                                       HTAB *seen_rels); /*  * find_inheritance_children  *
***************
*** 400,402 **** typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
--- 402,472 ----      return result; }
+ 
+ /*
+  * Check if the child is really a descendent of the parent
+  */
+ bool
+ is_descendent_of(Oid parentId, Oid childId)
+ {
+     HTAB       *seen_rels;
+     HASHCTL        ctl;
+     bool        ischild = false;
+ 
+     memset(&ctl, 0, sizeof(ctl));
+     ctl.keysize = sizeof(Oid);
+     ctl.entrysize = sizeof(Oid);
+     ctl.hcxt = CurrentMemoryContext;
+ 
+     seen_rels = hash_create("is_descendent_of temporary table",
+                             32, /* start small and extend */
+                             &ctl,
+                             HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ 
+     ischild = is_descendent_of_internal(parentId, childId, seen_rels);
+ 
+     hash_destroy(seen_rels);
+ 
+     return ischild;
+ }
+ 
+ static bool
+ is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels)
+ {
+     Relation    inhrel;
+     SysScanDesc scan;
+     ScanKeyData key[1];
+     bool        ischild = false;
+     HeapTuple    inheritsTuple;
+ 
+     inhrel = heap_open(InheritsRelationId, AccessShareLock);
+     ScanKeyInit(&key[0], Anum_pg_inherits_inhparent,
+                 BTEqualStrategyNumber, F_OIDEQ,    ObjectIdGetDatum(parentId));
+     scan = systable_beginscan(inhrel, InheritsParentIndexId, true,
+                               NULL, 1, key);
+ 
+     while ((inheritsTuple = systable_getnext(scan)) != NULL)
+     {
+         bool found;
+         Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+ 
+         hash_search(seen_rels, &inhrelid, HASH_ENTER, &found);
+ 
+         /*
+          * Recursively check into children. Although there can't theoretically
+          * be any cycles in the inheritance graph, check the cycles following
+          * find_all_inheritors.
+          */
+         if (inhrelid == childId ||
+             (!found && is_descendent_of_internal(inhrelid, childId, seen_rels)))
+         {
+             ischild = true;
+             break;
+         }
+     }
+ 
+     systable_endscan(scan);
+     heap_close(inhrel, AccessShareLock);
+ 
+     return ischild;
+ }
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***************
*** 108,123 **** static void expand_partitioned_rtentry(PlannerInfo *root,                            LOCKMODE
lockmode,                           bool *has_child, List **appinfos,                            List
**partitioned_child_rels);
! static void expand_single_inheritance_child(PlannerInfo *root,                                 RangeTblEntry
*parentrte,                                Index parentRTindex, Relation parentrel,
PlanRowMark*parentrc, Relation childrel,                                 bool *has_child, List **appinfos,
                  List **partitioned_child_rels);
 
! static void make_inh_translation_list(Relation oldrelation,                           Relation newrelation,
!                           Index newvarno,
!                           List **translated_vars); static Bitmapset *translate_col_privs(const Bitmapset
*parent_privs,                    List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node,
 
--- 108,122 ----                            LOCKMODE lockmode,                            bool *has_child, List
**appinfos,                           List **partitioned_child_rels);
 
! static bool expand_single_inheritance_child(PlannerInfo *root,                                 RangeTblEntry
*parentrte,                                Index parentRTindex, Relation parentrel,
PlanRowMark*parentrc, Relation childrel,                                 bool *has_child, List **appinfos,
                  List **partitioned_child_rels);
 
! static List *make_inh_translation_list(Relation oldrelation,                           Relation newrelation,
!                           Index newvarno); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
             List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node,
 
***************
*** 1476,1481 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
--- 1475,1482 ----          * in which they appear in the PartitionDesc.  But first, expand the          * parent
itself.         */
 
+ 
+         /* ignore the return value since this doesn't exclude the parent */
expand_single_inheritance_child(root,rte, rti, oldrelation, oldrc,                                         oldrelation,
                                       &has_child, &appinfos,
 
***************
*** 1497,1502 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
--- 1498,1504 ----         {             Oid            childOID = lfirst_oid(l);             Relation    newrelation;
+             bool        expanded = false;              /* Open rel if needed; we already have required locks */
     if (childOID != parentOID)
 
***************
*** 1516,1529 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)                 continue;
           } 
 
!             expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc,
     newrelation,                                             &has_child, &appinfos,
        &partitioned_child_rels); 
 
!             /* Close child relations, but keep locks */             if (childOID != parentOID)
!                 heap_close(newrelation, NoLock);         }     } 
--- 1518,1532 ----                 continue;             } 
!             expanded = expand_single_inheritance_child(root,
!                                             rte, rti, oldrelation, oldrc,
newrelation,                                            &has_child, &appinfos,
  &partitioned_child_rels); 
 
!             /* Close child relations, but keep locks if expanded */             if (childOID != parentOID)
!                 heap_close(newrelation, expanded ? NoLock : lockmode);         }     } 
***************
*** 1581,1586 **** expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
--- 1584,1590 ----     {         Oid            childOID = partdesc->oids[i];         Relation    childrel;
+         bool        expanded = false;          /* Open rel; we already have required locks */         childrel =
heap_open(childOID,NoLock);
 
***************
*** 1592,1598 **** expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,             continue;
} 
 
!         expand_single_inheritance_child(root, parentrte, parentRTindex,
parentrel,parentrc, childrel,                                         has_child, appinfos,
          partitioned_child_rels);
 
--- 1596,1603 ----             continue;         } 
!         expanded = expand_single_inheritance_child(root,
!                                         parentrte, parentRTindex,                                         parentrel,
parentrc,childrel,                                         has_child, appinfos,
partitioned_child_rels);
***************
*** 1606,1613 **** expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
            has_child, appinfos,                                           partitioned_child_rels); 
 
!         /* Close child relation, but keep locks */
!         heap_close(childrel, NoLock);     } } 
--- 1611,1618 ----                                           has_child, appinfos,
   partitioned_child_rels); 
 
!         /* Close child relation, but keep locks if expanded */
!         heap_close(childrel, expanded ? NoLock : lockmode);     } } 
***************
*** 1619,1626 **** expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,  * anything at all.
Otherwise,we'll set "has_child" to true, build a  * RangeTblEntry and either a PartitionedChildRelInfo or AppendRelInfo
as * appropriate, plus maybe a PlanRowMark.  */
 
! static void expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
  Index parentRTindex, Relation parentrel,                                 PlanRowMark *parentrc, Relation childrel,
 
--- 1624,1634 ----  * anything at all.  Otherwise, we'll set "has_child" to true, build a  * RangeTblEntry and either a
PartitionedChildRelInfoor AppendRelInfo as  * appropriate, plus maybe a PlanRowMark.
 
+  *
+  * Returns false if the child has been found that no longer a child of the
+  * parent.  */
! static bool expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
  Index parentRTindex, Relation parentrel,                                 PlanRowMark *parentrc, Relation childrel,
 
***************
*** 1661,1666 **** expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
--- 1669,1692 ----      */     if (childrte->relkind != RELKIND_PARTITIONED_TABLE)     {
+         List *translated_vars =
+             make_inh_translation_list(parentrel, childrel, childRTindex);
+ 
+         /*
+          * It may happen that the childOID is no longer a child of the
+          * parent.
+          */
+         if (!translated_vars)
+         {
+             /*
+              * This childrel is no longer a child of the parent.  Return doing
+              * nothing. Halfway built childrte is abandoned but this happens
+              * really rarely.
+              */
+             Assert (childOID != parentOID);
+             return false;
+         }
+          /* Remember if we saw a real child. */         if (childOID != parentOID)             *has_child = true;
***************
*** 1670,1677 **** expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
appinfo->child_relid= childRTindex;         appinfo->parent_reltype = parentrel->rd_rel->reltype;
appinfo->child_reltype= childrel->rd_rel->reltype;
 
!         make_inh_translation_list(parentrel, childrel, childRTindex,
!                                   &appinfo->translated_vars);         appinfo->parent_reloid = parentOID;
*appinfos= lappend(*appinfos, appinfo); 
 
--- 1696,1702 ----         appinfo->child_relid = childRTindex;         appinfo->parent_reltype =
parentrel->rd_rel->reltype;        appinfo->child_reltype = childrel->rd_rel->reltype;
 
!         appinfo->translated_vars = translated_vars;         appinfo->parent_reloid = parentOID;         *appinfos =
lappend(*appinfos,appinfo); 
 
***************
*** 1726,1744 **** expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,          root->rowMarks
=lappend(root->rowMarks, childrc);     } }  /*  * make_inh_translation_list  *      Build the list of translations from
parentVars to child Vars for
 
!  *      an inheritance child.  *  * For paranoia's sake, we match type/collation as well as attribute name.  */
! static void make_inh_translation_list(Relation oldrelation, Relation newrelation,
!                           Index newvarno,
!                           List **translated_vars) {     List       *vars = NIL;     TupleDesc    old_tupdesc =
RelationGetDescr(oldrelation);
--- 1751,1771 ----          root->rowMarks = lappend(root->rowMarks, childrc);     }
+ 
+     return true; }  /*  * make_inh_translation_list  *      Build the list of translations from parent Vars to child
Varsfor
 
!  *      an inheritance child. Returns NULL if the two relations are no longer in
!  *      a inheritance relationship.  *  * For paranoia's sake, we match type/collation as well as attribute name.
*/
! static List * make_inh_translation_list(Relation oldrelation, Relation newrelation,
!                           Index newvarno) {     List       *vars = NIL;     TupleDesc    old_tupdesc =
RelationGetDescr(oldrelation);
***************
*** 1808,1815 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
--- 1835,1852 ----                     break;             }             if (new_attno >= newnatts)
+             {
+                 /*
+                  * It's possible that newrelation is no longer a child of the
+                  * oldrelation. We just ingore it for the case.
+                  */
+                 if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id))
+                     return NULL;
+ 
+                 /* If still a child, something's wrong. */                 elog(ERROR, "could not find inherited
attribute\"%s\" of relation \"%s\"",                      attname, RelationGetRelationName(newrelation));
 
+             }         }          /* Found it, check type and collation match */
***************
*** 1828,1834 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
     0));     } 
 
!     *translated_vars = vars; }  /*
--- 1865,1871 ----                                      0));     } 
!     return vars; }  /*
*** a/src/include/catalog/pg_inherits_fn.h
--- b/src/include/catalog/pg_inherits_fn.h
***************
*** 23,27 **** extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
--- 23,28 ---- extern bool has_subclass(Oid relationId); extern bool has_superclass(Oid relationId); extern bool
typeInheritsFrom(OidsubclassTypeId, Oid superclassTypeId);
 
+ extern bool is_descendent_of(Oid parentId, Oid childId);  #endif                            /* PG_INHERITS_FN_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Simon Riggs
Date:
On 26 June 2017 at 10:16, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

> BTW, in the partitioned table case, the parent is always locked first
> using an AccessExclusiveLock.  There are other considerations in that case
> such as needing to recreate the partition descriptor upon termination of
> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).

Is this requirement documented or in comments anywhere?

I can't see anything about that, which is a fairly major usage point.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Amit Langote
Date:
On 2017/09/13 12:05, Simon Riggs wrote:
> On 26 June 2017 at 10:16, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> 
>> BTW, in the partitioned table case, the parent is always locked first
>> using an AccessExclusiveLock.  There are other considerations in that case
>> such as needing to recreate the partition descriptor upon termination of
>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> 
> Is this requirement documented or in comments anywhere?

Yes.  See the last sentence in the description of PARTITION OF clause in
CREATE TABLE:

https://www.postgresql.org/docs/devel/static/sql-createtable.html#sql-createtable-partition

And, the 4th point in the list of differences between declarative
partitioning and inheritance:

https://www.postgresql.org/docs/devel/static/ddl-partitioning.html#ddl-partitioning-implementation-inheritance

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Robert Haas
Date:
On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> The cause is that NO INHERIT doesn't take an exlusive lock on the
> parent. This allows expand_inherited_rtentry to add the child
> relation into appendrel after removal from the inheritance but
> still exists.
>
> I see two ways to fix this.
>
> The first patch adds a recheck of inheritance relationship if the
> corresponding attribute is missing in the child in
> make_inh_translation_list(). The recheck is a bit complex but it
> is not performed unless the sequence above is happen. It checks
> duplication of relid (or cycles in inheritance) following
> find_all_inheritors (but doing a bit different) but I'm not sure
> it is really useful.
>
>
> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> the parent first.
>
> Since the latter has a larger impact on the current behavior and
> we already treat "DROP TABLE child" case in the similar way, I
> suppose that the first approach would be preferable.
>
>
> Any comments or thoughts?

I agree that the second has less user impact, but I wonder if we can
think that will really fix the bug completely, or more generally if it
will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not
locking the parent.  I have a sneaking suspicion that may be wishful
thinking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Kyotaro HORIGUCHI
Date:
At Wed, 13 Sep 2017 20:20:48 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoZnKgN1dCQaBwHn1aDVPyAfQ8PGRmyUH22hxy39+Aqawg@mail.gmail.com>
> On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > The cause is that NO INHERIT doesn't take an exlusive lock on the
> > parent. This allows expand_inherited_rtentry to add the child
> > relation into appendrel after removal from the inheritance but
> > still exists.
> >
> > I see two ways to fix this.
> >
> > The first patch adds a recheck of inheritance relationship if the
> > corresponding attribute is missing in the child in
> > make_inh_translation_list(). The recheck is a bit complex but it
> > is not performed unless the sequence above is happen. It checks
> > duplication of relid (or cycles in inheritance) following
> > find_all_inheritors (but doing a bit different) but I'm not sure
> > it is really useful.
> >
> >
> > The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> > the parent first.
> >
> > Since the latter has a larger impact on the current behavior and
> > we already treat "DROP TABLE child" case in the similar way, I
> > suppose that the first approach would be preferable.
> >
> >
> > Any comments or thoughts?
> 
> I agree that the second has less user impact, but I wonder if we can
> think that will really fix the bug completely, or more generally if it
> will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not
> locking the parent.  I have a sneaking suspicion that may be wishful
> thinking.

Thanks for the comment.

The recheck patch prevent planner from involving just-detached
children while inheritance expansion. No simultaneous detatching
of children doesn't affect the planning before the time.

Once planner (the select side) gets lock on the child, the alter
side cannot do anything until the select finishes. If the alter
side won, the select side detects detaching immediately after the
lock is released then excludes the children. No problem will
occur ever after. Even in the case a child is replaced with
another table, it is nothing different from simple detaching.

As the result, I think that the recheck patch saves all possible
problem caused by simultaneously detached children.

However, the parent-locking patch is far smaller and it doesn't
need such an explanation on its perfectness. If another problem
occurs by simlultaneous detaching, it must haven't taken
necessary locks in the right order.

The most significant reason for my decision on this ptach was the
fact that the DROP child case have been resolved by rechecking
without parent locks, which is a kind of waste of resources if it
could be resolved without it. (And I saw that the same solution
is taken at least several places)

I don't think there's noticeable difference of behavior
(excluding performance) for users between the two solutions.

Is there anyone who has an opinion on the point?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Amit Langote
Date:
Hi.

On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
> << the following is another topic >>
> 
>>>> BTW, in the partitioned table case, the parent is always locked first
>>>> using an AccessExclusiveLock.  There are other considerations in that case
>>>> such as needing to recreate the partition descriptor upon termination of
>>>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
>>>
>>> Apart from the degree of concurrency, if we keep parent->children
>>> order of locking, such recreation does not seem to be
>>> needed. Maybe I'm missing something.
>>
>> Sorry to have introduced that topic in this thread, but I will try to
>> explain anyway why things are the way they are currently:
>>
>> Once a table is no longer a partition of the parent (detached or dropped),
>> we must make sure that the next commands in the transaction don't see it
>> as one.  That information is currently present in the relcache
>> (rd_partdesc), which is used by a few callers, most notably the
>> tuple-routing code.  Next commands must recreate the entry so that the
>> correct thing happens based on the updated information.  More precisely,
>> we must invalidate the current entry.  RelationClearRelation() will either
>> delete the entry or rebuild it.  If it's being referenced somewhere, it
>> will be rebuilt.  The place holding the reference may also be looking at
>> the content of rd_partdesc, which we don't require them to make a copy of,
>> so we must preserve its content while other fields of RelationData are
>> being read anew from the catalog.  We don't have to preserve it if there
>> has been any change (partition added/dropped), but to make such a change
>> one would need to take a strong enough lock on the relation (parent).  We
>> assume here that anyone who wants to reference rd_partdesc takes at least
>> AccessShareLock lock on the relation, and anyone who wants to change its
>> content must take a lock that will conflict with it, so
>> AccessExclusiveLock.  Note that in all of this, we are only talking about
>> one relation, that is the parent, so parent -> child ordering of taking
>> locks may be irrelevant.
> 
> I think I understand this, anyway DropInherit and DropPartition
> is different-but-at-the-same-level operations so surely needs
> amendment for drop/detach cases. Is there already a solution? Or
> reproducing steps?

Sorry, I think I forgot to reply to this.  Since you seem to have chosen
the other solution (checking that child is still a child), maybe this
reply is a bit too late, but anyway.

DropInherit or NO INHERIT is seen primarily as changing a child table's
(which is the target table of the command) property that it is no longer a
child of the parent, so we lock the child table to block concurrent
operations from considering it a child of parent anymore.  The fact that
parent is locked after the child and with ShareUpdateExclusiveLock instead
of AccessExclusiveLock, we observe this race condition when SELECTing from
the parent.

DropPartition or DETACH PARTITION is seen primarily as changing the parent
table's (which is the target table of the command) property that one of
the partitions is removed, so we lock the parent.   Any concurrent
operations that rely on the parent's relcache to get the partition list
will wait for the session that is dropping the partition to finish, so
that they get the fresh information from the relcache (or more importantly
do not end up with information obtained from the relcache going invalid
under them without notice).  Note that the lock on the partition/child is
also present and it plays more or less the the same role as it does in the
DropInherit case, but due to different order of locking, reported race
condition does not occur between SELECT on partitioned table and
DROP/DETACH PARTITION.


By the way, I will take a look at your patch when I come back from the
vacation.  Meanwhile, I noticed that it needs another rebase after
0a480502b092 [1].

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Amit Langote
Date:
On 2017/09/15 15:36, Amit Langote wrote:
> The fact that
> parent is locked after the child and with ShareUpdateExclusiveLock instead
> of AccessExclusiveLock, we observe this race condition when SELECTing from
> the parent.

Oops, I meant "parent table is locked with AccessShareLock instead of
AccessExclusiveLock..."

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Kyotaro HORIGUCHI
Date:
At Fri, 15 Sep 2017 15:36:26 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<d88a9a64-e307-59b5-b4c3-8d7fc11cb59d@lab.ntt.co.jp>
> Hi.
> 
> On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote:
> > << the following is another topic >>
> > 
> >>>> BTW, in the partitioned table case, the parent is always locked first
> >>>> using an AccessExclusiveLock.  There are other considerations in that case
> >>>> such as needing to recreate the partition descriptor upon termination of
> >>>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> >>>
> >>> Apart from the degree of concurrency, if we keep parent->children
> >>> order of locking, such recreation does not seem to be
> >>> needed. Maybe I'm missing something.
> >>
> >> Sorry to have introduced that topic in this thread, but I will try to
> >> explain anyway why things are the way they are currently:
> >>
> >> Once a table is no longer a partition of the parent (detached or dropped),
> >> we must make sure that the next commands in the transaction don't see it
> >> as one.  That information is currently present in the relcache
> >> (rd_partdesc), which is used by a few callers, most notably the
> >> tuple-routing code.  Next commands must recreate the entry so that the
> >> correct thing happens based on the updated information.  More precisely,
> >> we must invalidate the current entry.  RelationClearRelation() will either
> >> delete the entry or rebuild it.  If it's being referenced somewhere, it
> >> will be rebuilt.  The place holding the reference may also be looking at
> >> the content of rd_partdesc, which we don't require them to make a copy of,
> >> so we must preserve its content while other fields of RelationData are
> >> being read anew from the catalog.  We don't have to preserve it if there
> >> has been any change (partition added/dropped), but to make such a change
> >> one would need to take a strong enough lock on the relation (parent).  We
> >> assume here that anyone who wants to reference rd_partdesc takes at least
> >> AccessShareLock lock on the relation, and anyone who wants to change its
> >> content must take a lock that will conflict with it, so
> >> AccessExclusiveLock.  Note that in all of this, we are only talking about
> >> one relation, that is the parent, so parent -> child ordering of taking
> >> locks may be irrelevant.
> > 
> > I think I understand this, anyway DropInherit and DropPartition
> > is different-but-at-the-same-level operations so surely needs
> > amendment for drop/detach cases. Is there already a solution? Or
> > reproducing steps?
> 
> Sorry, I think I forgot to reply to this.  Since you seem to have chosen
> the other solution (checking that child is still a child), maybe this
> reply is a bit too late, but anyway.

I choosed it at that time for the reason mentioned upthread, but
haven't decided which is better.

> DropInherit or NO INHERIT is seen primarily as changing a child table's
> (which is the target table of the command) property that it is no longer a
> child of the parent, so we lock the child table to block concurrent
> operations from considering it a child of parent anymore.  The fact that
> parent is locked after the child and with ShareUpdateExclusiveLock instead
> of AccessExclusiveLock, we observe this race condition when SELECTing from
> the parent.
> 
> DropPartition or DETACH PARTITION is seen primarily as changing the parent
> table's (which is the target table of the command) property that one of
> the partitions is removed, so we lock the parent.   Any concurrent
> operations that rely on the parent's relcache to get the partition list
> will wait for the session that is dropping the partition to finish, so
> that they get the fresh information from the relcache (or more importantly
> do not end up with information obtained from the relcache going invalid
> under them without notice).  Note that the lock on the partition/child is
> also present and it plays more or less the the same role as it does in the
> DropInherit case, but due to different order of locking, reported race
> condition does not occur between SELECT on partitioned table and
> DROP/DETACH PARTITION.

Thank you for the explanation. I understand that the difference
comes from which of parent and children has the information about
inheritance/partitioning. DROP child and ALTER child NO INHERITS
are (I think) the only two operations that intiated from children
side. The parent-locking patch results in different solutions for
similar problems.

However, parent locking on DROPped child requires a new index
InheritsRelidIndex to avoid full scan on pg_inherits, but the NO
INHERITS case doesn't. This might be a good reason for the
difference.

I noticed that DROP TABLE partition acquires lock on the parent
in a callback for RangeVarGetRelid(Extended). The same way seems
reasonable for the NO INHERIT case. As the result the patch
becomes far small and less invasive than the previous
one. (dropinh_lock_parent_v2.patch)

As a negative PoC, attacched dropchild_lock_parent_PoC.patch only
to show how the same method on DROP TABLE case looks.


> By the way, I will take a look at your patch when I come back from the
> vacation.  Meanwhile, I noticed that it needs another rebase after
> 0a480502b092 [1].

Great. Thanks.

> Thanks,
> Amit
> 
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 563bcda..b7c87b9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13175,7 +13175,28 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,        reltype =
((AlterObjectSchemaStmt*) stmt)->objectType;    else if (IsA(stmt, AlterTableStmt))
 
-        reltype = ((AlterTableStmt *) stmt)->relkind;
+    {
+        AlterTableStmt *alterstmt = (AlterTableStmt *)stmt;
+        ListCell *lc;
+
+        reltype = alterstmt->relkind;
+
+        foreach (lc, alterstmt->cmds)
+        {
+            AlterTableCmd *cmd = lfirst_node(AlterTableCmd, lc);
+            Assert(IsA(cmd, AlterTableCmd));
+
+            /*
+             * Though NO INHERIT doesn't modify the parent, concurrent
+             * expansion of inheritances will see a false child. We must
+             * acquire lock on the parent when dropping inheritance, but no
+             * need to ascend further.
+             */
+            if (cmd->subtype == AT_DropInherit)
+                RangeVarGetRelid((RangeVar *)cmd->def,
+                                 AccessExclusiveLock, false);
+        }
+    }    else    {        reltype = OBJECT_TABLE; /* placate compiler */
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 245a374..b83e248 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -126,19 +126,6 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)        {            /* Get the lock
tosynchronize against concurrent drop */            LockRelationOid(inhrelid, lockmode);
 
-
-            /*
-             * Now that we have the lock, double-check to see if the relation
-             * really exists or not.  If not, assume it was dropped while we
-             * waited to acquire lock, and ignore it.
-             */
-            if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(inhrelid)))
-            {
-                /* Release useless lock */
-                UnlockRelationOid(inhrelid, lockmode);
-                /* And ignore this relation */
-                continue;
-            }        }        list = lappend_oid(list, inhrelid);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b7c87b9..62c8860 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1120,10 +1120,10 @@ RemoveRelations(DropStmt *drop)}/*
- * Before acquiring a table lock, check whether we have sufficient rights.
- * In the case of DROP INDEX, also try to lock the table before the index.
- * Also, if the table to be dropped is a partition, we try to lock the parent
- * first.
+ * Before acquiring a table lock, check whether we have sufficient rights.  In
+ * the case of DROP INDEX, also try to lock the table before the index.  Also,
+ * if the table to be dropped is a partition or inheritance children, we try
+ * to lock the parent first. */static voidRangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid
oldRelOid,
@@ -1229,6 +1229,26 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,        if
(OidIsValid(state->partParentOid))           LockRelationOid(state->partParentOid, AccessExclusiveLock);    }
 
+
+    /* the same for all inheritance parents */
+    if (relOid != oldRelOid)
+    {
+        Relation    inhrel;
+        SysScanDesc    scan;
+        HeapTuple    inhtup;
+
+        inhrel = heap_open(InheritsRelationId, AccessShareLock);
+        scan = systable_beginscan(inhrel, InvalidOid, false, NULL, 0, NULL);
+        while (HeapTupleIsValid(inhtup = systable_getnext(scan)))
+        {
+            Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inhtup);
+
+            if (inh->inhrelid == relOid && OidIsValid(inh->inhparent))
+                LockRelationOid(inh->inhparent, AccessExclusiveLock);
+        }
+        systable_endscan(scan);
+        heap_close(inhrel, AccessShareLock);
+    }}/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Michael Paquier
Date:
On Tue, Sep 19, 2017 at 3:04 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> By the way, I will take a look at your patch when I come back from the
>> vacation.  Meanwhile, I noticed that it needs another rebase after
>> 0a480502b092 [1].

Moved to CF 2018-01.
-- 
Michael


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Kyotaro HORIGUCHI
Date:
Hello,

At Wed, 29 Nov 2017 14:04:01 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTt-mj4S8qoO_C9CaFAcV0J3vhgg4_Kw2U1wXvyEYB0bw@mail.gmail.com>
> On Tue, Sep 19, 2017 at 3:04 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> By the way, I will take a look at your patch when I come back from the
> >> vacation.  Meanwhile, I noticed that it needs another rebase after
> >> 0a480502b092 [1].
> 
> Moved to CF 2018-01.

Thank you. (I'll do that by myself from the next CF)

This is rebased patch and additional patch of isolation test for
this problem.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 4336b15d2c0d95e7044746aa5c3ae622712e41b3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Tue, 12 Dec 2017 17:06:53 +0900
Subject: [PATCH 1/2] Add isolation test

---
 src/test/isolation/expected/select-noinherit.out |  9 +++++++++
 src/test/isolation/isolation_schedule            |  1 +
 src/test/isolation/specs/select-noinherit.spec   | 23 +++++++++++++++++++++++
 3 files changed, 33 insertions(+)
 create mode 100644 src/test/isolation/expected/select-noinherit.out
 create mode 100644 src/test/isolation/specs/select-noinherit.spec

diff --git a/src/test/isolation/expected/select-noinherit.out b/src/test/isolation/expected/select-noinherit.out
new file mode 100644
index 0000000..5885167
--- /dev/null
+++ b/src/test/isolation/expected/select-noinherit.out
@@ -0,0 +1,9 @@
+Parsed test spec with 2 sessions
+
+starting permutation: alt1 sel2 c1
+step alt1: ALTER TABLE c1 NO INHERIT p;
+step sel2: SELECT * FROM p; <waiting ...>
+step c1: COMMIT;
+step sel2: <... completed>
+a              
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index e41b916..6e04ea4 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -63,3 +63,4 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: select-noinherit
diff --git a/src/test/isolation/specs/select-noinherit.spec b/src/test/isolation/specs/select-noinherit.spec
new file mode 100644
index 0000000..31662a9
--- /dev/null
+++ b/src/test/isolation/specs/select-noinherit.spec
@@ -0,0 +1,23 @@
+# SELECT and ALTER TABLE NO INHERIT test
+#
+
+setup
+{
+ CREATE TABLE p (a integer);
+ CREATE TABLE c1 () INHERITS (p);
+}
+
+teardown
+{
+ DROP TABLE p CASCADE;
+}
+
+session "s1"
+setup        { BEGIN ISOLATION LEVEL READ COMMITTED; }
+step "alt1"    { ALTER TABLE c1 NO INHERIT p; }
+step "c1"    { COMMIT; }
+
+session "s2"
+step "sel2"    { SELECT * FROM p; }
+
+permutation "alt1" "sel2" "c1"
-- 
2.9.2

From c2793d9200948d693150a5bbeb3815e3b5404be2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Tue, 12 Dec 2017 17:38:12 +0900
Subject: [PATCH 2/2] Lock parent on ALTER TABLE NO INHERIT

NO INHERIT doesn't modify the parent at all but lock is required to
avoid error caused when a concurrent access see a false child.
---
 src/backend/commands/tablecmds.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d979ce2..a8d119f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13246,7 +13246,28 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
         reltype = ((AlterObjectSchemaStmt *) stmt)->objectType;
 
     else if (IsA(stmt, AlterTableStmt))
-        reltype = ((AlterTableStmt *) stmt)->relkind;
+    {
+        AlterTableStmt *alterstmt = (AlterTableStmt *)stmt;
+        ListCell *lc;
+
+        reltype = alterstmt->relkind;
+
+        foreach (lc, alterstmt->cmds)
+        {
+            AlterTableCmd *cmd = lfirst_node(AlterTableCmd, lc);
+            Assert(IsA(cmd, AlterTableCmd));
+
+            /*
+             * Though NO INHERIT doesn't modify the parent, lock on the parent
+             * is necessary so that no concurrent expansion of inheritances
+             * sees a false child and ends with ERROR.  But no need to ascend
+             * further.
+             */
+            if (cmd->subtype == AT_DropInherit)
+                RangeVarGetRelid((RangeVar *)cmd->def,
+                                 AccessExclusiveLock, false);
+        }
+    }
     else
     {
         reltype = OBJECT_TABLE; /* placate compiler */
-- 
2.9.2


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> [ 0002-Lock-parent-on-ALTER-TABLE-NO-INHERIT.patch ]

I don't especially like any of the patches proposed on this thread.
The one with rechecking inheritance seems expensive, duplicative,
and complicated.  The approach of taking a lock on the parent will
create deadlocks in usage patterns that did not encounter such
deadlocks before --- in particular, in the scenarios in which this
behavior matters at all, the ALTER will deadlock against ordinary
queries that lock the parent before the child.  So I can't believe
that anyone who's hitting the problem in the field will think the
extra lock is an improvement.

I think that there might be a much simpler solution to this, which
is to just remove make_inh_translation_list's tests of attinhcount,
as per attached.  Those are really pretty redundant: since we are
matching by column name, the unique index on pg_attribute already
guarantees there is at most one matching column.  I have a feeling
that those tests are my fault and I put them in on the theory that
they could save a few strcmp executions --- but if they're causing
problems, we can certainly drop them.  In any case they'd only save
something meaningful if most of the child's columns are non-inherited,
which doesn't seem like the way to bet.

In this way, if the child gets past the initial check on whether it's
been dropped, we'll still succeed in matching its columns, even if
they are no longer marked as inherited.  For me, this works fine with
either the ALTER NO INHERIT case (c1 does get scanned, with no error)
or the DROP TABLE case (c1 doesn't get scanned).  Now, you can break
it if you really try hard: make the concurrent transaction do both
ALTER NO INHERIT and then DROP COLUMN.  But that's not a scenario
that's been complained of, so I don't want to add a ton of mechanism
to fix it.

            regards, tom lane

diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 95557d7..5c4d113 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1832,7 +1832,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
          */
         if (old_attno < newnatts &&
             (att = TupleDescAttr(new_tupdesc, old_attno)) != NULL &&
-            !att->attisdropped && att->attinhcount != 0 &&
+            !att->attisdropped &&
             strcmp(attname, NameStr(att->attname)) == 0)
             new_attno = old_attno;
         else
@@ -1840,7 +1840,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
             for (new_attno = 0; new_attno < newnatts; new_attno++)
             {
                 att = TupleDescAttr(new_tupdesc, new_attno);
-                if (!att->attisdropped && att->attinhcount != 0 &&
+                if (!att->attisdropped &&
                     strcmp(attname, NameStr(att->attname)) == 0)
                     break;
             }

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Tom Lane
Date:
I wrote:
> I think that there might be a much simpler solution to this, which
> is to just remove make_inh_translation_list's tests of attinhcount,
> as per attached.

I went ahead and pushed that, with an isolation test based on the
one Horiguchi-san submitted but covering a few related cases.

            regards, tom lane


Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

From
Kyotaro HORIGUCHI
Date:
Hello, sorry for my late reply.

At Wed, 10 Jan 2018 14:56:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <26017.1515614209@sss.pgh.pa.us>
> I think that there might be a much simpler solution to this, which
> is to just remove make_inh_translation_list's tests of attinhcount,
> as per attached.  Those are really pretty redundant: since we are
> matching by column name, the unique index on pg_attribute already
> guarantees there is at most one matching column.  I have a feeling

My thought were restricted to the same behavior as the drop case,
but Tom's solution is also fine for me. I agree to the point that
the colums with the same name in a inheritance tree are safely
assumed to be in a inheritance relationship. (Assuming everything
is consistent.)

At Fri, 12 Jan 2018 15:52:08 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <15881.1515790328@sss.pgh.pa.us>
> I wrote:
> > I think that there might be a much simpler solution to this, which
> > is to just remove make_inh_translation_list's tests of attinhcount,
> > as per attached.
> 
> I went ahead and pushed that, with an isolation test based on the
> one Horiguchi-san submitted but covering a few related cases.

Thank you for commiting it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center