Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Date
Msg-id 20170828.182807.98097766.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
List pgsql-hackers
Hello,

I'll add this to CF2017-09.

At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in
<75fe42df-b1d8-89ff-596d-d9da0749e66d@lab.ntt.co.jp>
> On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
> >>
> >> I recall I had proposed a fix for the same thing some time ago [1].
> > 
> > Wow. About 1.5 years ago and stuck? I have a concrete case where
> > this harms  so I'd like to fix that this time. How can we move on?
> 
> Agreed that this should be fixed.
> 
> Your proposed approach #1 to recheck the inheritance after obtaining the
> lock on the child table seems to be a good way forward.
> 
> Approach #2 of reordering locking is a simpler solution, but does not
> completely prevent the problem, because DROP TABLE child can still cause
> it to occur, as you mentioned.
> 
> >>> The cause is that NO INHERIT doesn't take an exlusive lock on the
> >>> parent. This allows expand_inherited_rtentry to add the child
> >>> relation into appendrel after removal from the inheritance but
> >>> still exists.
> >>
> >> Right.
> >>
> >>> I see two ways to fix this.
> >>>
> >>> The first patch adds a recheck of inheritance relationship if the
> >>> corresponding attribute is missing in the child in
> >>> make_inh_translation_list(). The recheck is a bit complex but it
> >>> is not performed unless the sequence above is happen. It checks
> >>> duplication of relid (or cycles in inheritance) following
> >>> find_all_inheritors (but doing a bit different) but I'm not sure
> >>> it is really useful.
> >>
> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> >> I guess your hash table based solution will do the job as far as
> >> performance of this check is concerned, although I haven't checked the
> >> code closely.
> > 
> > The hash table is not crucial in the patch. Substantially it just
> > recurs down looking up pg_inherits for the child. I considered
> > the new index but abandoned because I thought that such case
> > won't occur so frequently.
> 
> Agreed.  BTW, the hash table in patch #1 does not seem to be really
> helpful.  In the while loop in is_descendant_of_internal(), does
> hash_search() ever return found = true?  AFAICS, it does not.
> 
> >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> >>> the parent first.
> >>>
> >>> Since the latter has a larger impact on the current behavior and
> >>> we already treat "DROP TABLE child" case in the similar way, I
> >>> suppose that the first approach would be preferable.
> >>
> >> That makes sense.

So, I attached only the first patch, rebased on the current
master (It actually failed to apply on it.) and fixed a typo in a
comment.

This still runs a closed-loop test using temporary hash but it
seems a bit paranoic. (this is the same check with what
find_all_inheritors is doing)


<< the following is another topic >>

> >> BTW, in the partitioned table case, the parent is always locked first
> >> using an AccessExclusiveLock.  There are other considerations in that case
> >> such as needing to recreate the partition descriptor upon termination of
> >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> > 
> > Apart from the degree of concurrency, if we keep parent->children
> > order of locking, such recreation does not seem to be
> > needed. Maybe I'm missing something.
> 
> Sorry to have introduced that topic in this thread, but I will try to
> explain anyway why things are the way they are currently:
> 
> Once a table is no longer a partition of the parent (detached or dropped),
> we must make sure that the next commands in the transaction don't see it
> as one.  That information is currently present in the relcache
> (rd_partdesc), which is used by a few callers, most notably the
> tuple-routing code.  Next commands must recreate the entry so that the
> correct thing happens based on the updated information.  More precisely,
> we must invalidate the current entry.  RelationClearRelation() will either
> delete the entry or rebuild it.  If it's being referenced somewhere, it
> will be rebuilt.  The place holding the reference may also be looking at
> the content of rd_partdesc, which we don't require them to make a copy of,
> so we must preserve its content while other fields of RelationData are
> being read anew from the catalog.  We don't have to preserve it if there
> has been any change (partition added/dropped), but to make such a change
> one would need to take a strong enough lock on the relation (parent).  We
> assume here that anyone who wants to reference rd_partdesc takes at least
> AccessShareLock lock on the relation, and anyone who wants to change its
> content must take a lock that will conflict with it, so
> AccessExclusiveLock.  Note that in all of this, we are only talking about
> one relation, that is the parent, so parent -> child ordering of taking
> locks may be irrelevant.

I think I understand this, anyway DropInherit and DropPartition
is different-but-at-the-same-level operations so surely needs
amendment for drop/detach cases. Is there already a solution? Or
reproducing steps?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/catalog/pg_inherits.c
--- b/src/backend/catalog/pg_inherits.c
***************
*** 42,47 **** typedef struct SeenRelsEntry
--- 42,49 ----     ListCell   *numparents_cell;    /* corresponding list cell */ } SeenRelsEntry; 
+ static bool is_descendent_of_internal(Oid parentId, Oid childId,
+                                       HTAB *seen_rels); /*  * find_inheritance_children  *
***************
*** 400,402 **** typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
--- 402,472 ----      return result; }
+ 
+ /*
+  * Check if the child is really a descendent of the parent
+  */
+ bool
+ is_descendent_of(Oid parentId, Oid childId)
+ {
+     HTAB       *seen_rels;
+     HASHCTL        ctl;
+     bool        ischild = false;
+ 
+     memset(&ctl, 0, sizeof(ctl));
+     ctl.keysize = sizeof(Oid);
+     ctl.entrysize = sizeof(Oid);
+     ctl.hcxt = CurrentMemoryContext;
+ 
+     seen_rels = hash_create("is_descendent_of temporary table",
+                             32, /* start small and extend */
+                             &ctl,
+                             HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ 
+     ischild = is_descendent_of_internal(parentId, childId, seen_rels);
+ 
+     hash_destroy(seen_rels);
+ 
+     return ischild;
+ }
+ 
+ static bool
+ is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels)
+ {
+     Relation    inhrel;
+     SysScanDesc scan;
+     ScanKeyData key[1];
+     bool        ischild = false;
+     HeapTuple    inheritsTuple;
+ 
+     inhrel = heap_open(InheritsRelationId, AccessShareLock);
+     ScanKeyInit(&key[0], Anum_pg_inherits_inhparent,
+                 BTEqualStrategyNumber, F_OIDEQ,    ObjectIdGetDatum(parentId));
+     scan = systable_beginscan(inhrel, InheritsParentIndexId, true,
+                               NULL, 1, key);
+ 
+     while ((inheritsTuple = systable_getnext(scan)) != NULL)
+     {
+         bool found;
+         Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+ 
+         hash_search(seen_rels, &inhrelid, HASH_ENTER, &found);
+ 
+         /*
+          * Recursively check into children. Although there can't theoretically
+          * be any cycles in the inheritance graph, check the cycles following
+          * find_all_inheritors.
+          */
+         if (inhrelid == childId ||
+             (!found && is_descendent_of_internal(inhrelid, childId, seen_rels)))
+         {
+             ischild = true;
+             break;
+         }
+     }
+ 
+     systable_endscan(scan);
+     heap_close(inhrel, AccessShareLock);
+ 
+     return ischild;
+ }
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***************
*** 100,109 **** static List *generate_append_tlist(List *colTypes, List *colCollations, static List
*generate_setop_grouplist(SetOperationStmt*op, List *targetlist); static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                          Index rti);
 
! static void make_inh_translation_list(Relation oldrelation,                           Relation newrelation,
!                           Index newvarno,
!                           List **translated_vars); static Bitmapset *translate_col_privs(const Bitmapset
*parent_privs,                    List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node,
 
--- 100,108 ---- static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); static void
expand_inherited_rtentry(PlannerInfo*root, RangeTblEntry *rte,                          Index rti);
 
! static List *make_inh_translation_list(Relation oldrelation,                           Relation newrelation,
!                           Index newvarno); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
             List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node,
 
***************
*** 1508,1513 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
--- 1507,1527 ----          */         if (childrte->relkind != RELKIND_PARTITIONED_TABLE)         {
+             List *translated_vars =
+                 make_inh_translation_list(oldrelation, newrelation,
+                                           childRTindex);
+ 
+             if (!translated_vars)
+             {
+                 /*
+                  * This childrel is no longer a child of the parent.
+                  * Close the child relation releasing locks.
+                  */
+                 if (childOID != parentOID)
+                     heap_close(newrelation, lockmode);
+                 continue;
+             }
+              /* Remember if we saw a real child. */             if (childOID != parentOID)                 has_child
=true;
 
***************
*** 1517,1524 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
appinfo->child_relid= childRTindex;             appinfo->parent_reltype = oldrelation->rd_rel->reltype;
appinfo->child_reltype= newrelation->rd_rel->reltype;
 
!             make_inh_translation_list(oldrelation, newrelation, childRTindex,
!                                       &appinfo->translated_vars);             appinfo->parent_reloid = parentOID;
       appinfos = lappend(appinfos, appinfo); 
 
--- 1531,1537 ----             appinfo->child_relid = childRTindex;             appinfo->parent_reltype =
oldrelation->rd_rel->reltype;            appinfo->child_reltype = newrelation->rd_rel->reltype;
 
!             appinfo->translated_vars = translated_vars;             appinfo->parent_reloid = parentOID;
appinfos= lappend(appinfos, appinfo); 
 
***************
*** 1623,1636 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) /*  *
make_inh_translation_list *      Build the list of translations from parent Vars to child Vars for
 
!  *      an inheritance child.  *  * For paranoia's sake, we match type/collation as well as attribute name.  */
! static void make_inh_translation_list(Relation oldrelation, Relation newrelation,
!                           Index newvarno,
!                           List **translated_vars) {     List       *vars = NIL;     TupleDesc    old_tupdesc =
RelationGetDescr(oldrelation);
--- 1636,1649 ---- /*  * make_inh_translation_list  *      Build the list of translations from parent Vars to child
Varsfor
 
!  *      an inheritance child. Returns NULL if the two relations are no longer in
!  *      a inheritance relationship.  *  * For paranoia's sake, we match type/collation as well as attribute name.
*/
! static List * make_inh_translation_list(Relation oldrelation, Relation newrelation,
!                           Index newvarno) {     List       *vars = NIL;     TupleDesc    old_tupdesc =
RelationGetDescr(oldrelation);
***************
*** 1700,1707 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
--- 1713,1730 ----                     break;             }             if (new_attno >= newnatts)
+             {
+                 /*
+                  * It's possible that newrelation is no longer a child of the
+                  * oldrelation. We just ingore it for the case.
+                  */
+                 if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id))
+                     return NULL;
+ 
+                 /* If still a child, something's wrong. */                 elog(ERROR, "could not find inherited
attribute\"%s\" of relation \"%s\"",                      attname, RelationGetRelationName(newrelation));
 
+             }         }          /* Found it, check type and collation match */
***************
*** 1720,1726 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
     0));     } 
 
!     *translated_vars = vars; }  /*
--- 1743,1749 ----                                      0));     } 
!     return vars; }  /*
*** a/src/include/catalog/pg_inherits_fn.h
--- b/src/include/catalog/pg_inherits_fn.h
***************
*** 23,27 **** extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
--- 23,28 ---- extern bool has_subclass(Oid relationId); extern bool has_superclass(Oid relationId); extern bool
typeInheritsFrom(OidsubclassTypeId, Oid superclassTypeId);
 
+ extern bool is_descendent_of(Oid parentId, Oid childId);  #endif                            /* PG_INHERITS_FN_H */

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] show "aggressive" or not in autovacuum logs
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] psql --batch