Thread: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Hello. I had a case of unexpected error caused by ALTER TABLE NO INHERIT. =# CREATE TABLE p (a int); =# CREATE TABLE c1 () INHERITS (p); session A=# BEGIN; session A=# ALTER TABLE c1 NO INHERIT p; session B=# EXPLAIN ANALYZE SELECT * FROM p; (blocked) session A=# COMMIT; session B: ERROR: could not find inherited attribute "a" of relation "c1" This happens at least back to 9.1 to master and doesn't seem to be a designed behavior. The cause is that NO INHERIT doesn't take an exlusive lock on the parent. This allows expand_inherited_rtentry to add the child relation into appendrel after removal from the inheritance but still exists. I see two ways to fix this. The first patch adds a recheck of inheritance relationship if the corresponding attribute is missing in the child in make_inh_translation_list(). The recheck is a bit complex but it is not performed unless the sequence above is happen. It checks duplication of relid (or cycles in inheritance) following find_all_inheritors (but doing a bit different) but I'm not sure it is really useful. The second patch lets ALTER TABLE NO INHERIT to acquire locks on the parent first. Since the latter has a larger impact on the current behavior and we already treat "DROP TABLE child" case in the similar way, I suppose that the first approach would be preferable. Any comments or thoughts? regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index e5fb52c..1670d11 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -42,6 +42,8 @@ typedef struct SeenRelsEntry ListCell *numparents_cell; /* corresponding list cell */} SeenRelsEntry; +static bool is_descendent_of_internal(Oid parentId, Oid childId, + HTAB *seen_rels);/* * find_inheritance_children * @@ -376,3 +378,71 @@ typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId) return result;} + +/* + * Check if the child is really a descendent of the parent + */ +bool +is_descendent_of(Oid parentId, Oid childId) +{ + HTAB *seen_rels; + HASHCTL ctl; + bool ischild = false; + + memset(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(Oid); + ctl.entrysize = sizeof(Oid); + ctl.hcxt = CurrentMemoryContext; + + seen_rels = hash_create("is_descendent_of temporary table", + 32, /* start small and extend */ + &ctl, + HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); + + ischild = is_descendent_of_internal(parentId, childId, seen_rels); + + hash_destroy(seen_rels); + + return ischild; +} + +static bool +is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels) +{ + Relation inhrel; + SysScanDesc scan; + ScanKeyData key[1]; + bool ischild = false; + HeapTuple inheritsTuple; + + inhrel = heap_open(InheritsRelationId, AccessShareLock); + ScanKeyInit(&key[0], Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(parentId)); + scan = systable_beginscan(inhrel, InheritsParentIndexId, true, + NULL, 1, key); + + while ((inheritsTuple = systable_getnext(scan)) != NULL) + { + bool found; + Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid; + + hash_search(seen_rels, &inhrelid, HASH_ENTER, &found); + + /* + * Recursively check into children. Although there can't theoretically + * be any cycles in the inheritance graph, check the cycles following + * find_all_inheritors. + */ + if (inhrelid == childId || + (!found && is_descendent_of_internal(inhrelid, childId, seen_rels))) + { + ischild = true; + break; + } + } + + systable_endscan(scan); + heap_close(inhrel, AccessShareLock); + + return ischild; +} diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index cf46b74..f1c582a 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -99,10 +99,9 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,static List *generate_setop_grouplist(SetOperationStmt*op, List *targetlist);static void expand_inherited_rtentry(PlannerInfo *root,RangeTblEntry *rte, Index rti); -static void make_inh_translation_list(Relation oldrelation, +static List *make_inh_translation_list(Relation oldrelation, Relation newrelation, - Index newvarno, - List **translated_vars); + Index newvarno);static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars);static Node *adjust_appendrel_attrs_mutator(Node *node, @@ -1502,14 +1501,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) */ if (childrte->relkind!= RELKIND_PARTITIONED_TABLE) { + List *translated_vars = + make_inh_translation_list(oldrelation, newrelation, + childRTindex); + + if (!translated_vars) + { + /* + * This childrel is no longer a child of the parent. + * Close the child relation releasing locks. + */ + if (childOID != parentOID) + heap_close(newrelation, lockmode); + continue; + } + need_append = true; appinfo = makeNode(AppendRelInfo); appinfo->parent_relid = rti; appinfo->child_relid = childRTindex; appinfo->parent_reltype = oldrelation->rd_rel->reltype; appinfo->child_reltype = newrelation->rd_rel->reltype; - make_inh_translation_list(oldrelation, newrelation, childRTindex, - &appinfo->translated_vars); + appinfo->translated_vars = translated_vars; appinfo->parent_reloid = parentOID; appinfos= lappend(appinfos, appinfo); @@ -1614,14 +1627,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)/* * make_inh_translation_list* Build the list of translations from parent Vars to child Vars for - * an inheritance child. + * an inheritance child. Returns NULL if the two relations are no longer in + * a inheritance relationship. * * For paranoia's sake, we match type/collation as well as attribute name. */ -static void +static List *make_inh_translation_list(Relation oldrelation, Relation newrelation, - Index newvarno, - List **translated_vars) + Index newvarno){ List *vars = NIL; TupleDesc old_tupdesc = RelationGetDescr(oldrelation); @@ -1691,8 +1704,18 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, break; } if (new_attno >= newnatts) + { + /* + * It's possible that newrelation is no longer a child of the + * newrelation. We just ingore it for the case. + */ + if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id)) + return NULL; + + /* If still a child, something's wrong. */ elog(ERROR, "could not find inherited attribute\"%s\" of relation \"%s\"", attname, RelationGetRelationName(newrelation)); + } } /* Found it, check type and collation match */ @@ -1711,7 +1734,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, 0)); } - *translated_vars = vars; + return vars;}/* diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h index abfa476..3affe40 100644 --- a/src/include/catalog/pg_inherits_fn.h +++ b/src/include/catalog/pg_inherits_fn.h @@ -22,5 +22,6 @@ extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **parents);externbool has_subclass(Oid relationId);extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId); +extern bool is_descendent_of(Oid parentId, Oid childId);#endif /* PG_INHERITS_FN_H */ diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 7d9c769..f69631f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3397,6 +3397,30 @@ AlterTableGetLockLevel(List *cmds)}/* + * Some AT commands require to take a lock on the parent prior to the target. + */ +void +AlterTableLockInhParent(List *cmds, LOCKMODE lockmode) +{ + ListCell *lcmd; + + foreach(lcmd, cmds) + { + AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd); + + switch (cmd->subtype) + { + case AT_DropInherit: + RangeVarGetRelid((RangeVar *)cmd->def, lockmode, false); + break; + + default: + break; + } + } +} + +/* * ATController provides top level control over the phases. * * parsetree is passed in to allow it to be passed to eventtriggers @@ -11446,12 +11470,8 @@ ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode) (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot change inheritance of a partition"))); - /* - * AccessShareLock on the parent is probably enough, seeing that DROP - * TABLE doesn't lock parent tables at all. We need some lock since we'll - * be inspecting the parent's schema. - */ - parent_rel = heap_openrv(parent, AccessShareLock); + /* Requreid lock is already held */ + parent_rel = heap_openrv(parent, NoLock); /* * We don't bother to check ownership of the parent table --- ownershipof diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index ddacac8..ce3406b 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -1102,6 +1102,9 @@ ProcessUtilitySlow(ParseState *pstate, * permissions. */ lockmode = AlterTableGetLockLevel(atstmt->cmds); + + /* Lock the parent first if required */ + AlterTableLockInhParent(atstmt->cmds, lockmode); relid = AlterTableLookupRelation(atstmt,lockmode); if (OidIsValid(relid)) diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h index abd31b6..c643326 100644 --- a/src/include/commands/tablecmds.h +++ b/src/include/commands/tablecmds.h @@ -27,6 +27,7 @@ extern ObjectAddress DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,extern void RemoveRelations(DropStmt*drop); +void AlterTableLockInhParent(List *cmds, LOCKMODE lockmode);extern Oid AlterTableLookupRelation(AlterTableStmt *stmt,LOCKMODE lockmode);extern void AlterTable(Oid relid, LOCKMODE lockmode, AlterTableStmt *stmt);
Hello, On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote: > Hello. > > I had a case of unexpected error caused by ALTER TABLE NO > INHERIT. > > =# CREATE TABLE p (a int); > =# CREATE TABLE c1 () INHERITS (p); > > session A=# BEGIN; > session A=# ALTER TABLE c1 NO INHERIT p; > > session B=# EXPLAIN ANALYZE SELECT * FROM p; > (blocked) > > session A=# COMMIT; > > session B: ERROR: could not find inherited attribute "a" of relation "c1" > > This happens at least back to 9.1 to master and doesn't seem to > be a designed behavior. I recall I had proposed a fix for the same thing some time ago [1]. > The cause is that NO INHERIT doesn't take an exlusive lock on the > parent. This allows expand_inherited_rtentry to add the child > relation into appendrel after removal from the inheritance but > still exists. Right. > I see two ways to fix this. > > The first patch adds a recheck of inheritance relationship if the > corresponding attribute is missing in the child in > make_inh_translation_list(). The recheck is a bit complex but it > is not performed unless the sequence above is happen. It checks > duplication of relid (or cycles in inheritance) following > find_all_inheritors (but doing a bit different) but I'm not sure > it is really useful. I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but I guess your hash table based solution will do the job as far as performance of this check is concerned, although I haven't checked the code closely. > The second patch lets ALTER TABLE NO INHERIT to acquire locks on > the parent first. > > Since the latter has a larger impact on the current behavior and > we already treat "DROP TABLE child" case in the similar way, I > suppose that the first approach would be preferable. That makes sense. BTW, in the partitioned table case, the parent is always locked first using an AccessExclusiveLock. There are other considerations in that case such as needing to recreate the partition descriptor upon termination of inheritance (both the DETACH PARTITION and also DROP TABLE child cases). Thanks, Amit [1] https://www.postgresql.org/message-id/565EB1F2.7000201%40lab.ntt.co.jp
Hello, At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <7f5fe522-f328-3c40-565f-5e1ce37455d1@lab.ntt.co.jp> > Hello, > > On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote: > > Hello. > > > > I had a case of unexpected error caused by ALTER TABLE NO > > INHERIT. > > > > =# CREATE TABLE p (a int); > > =# CREATE TABLE c1 () INHERITS (p); > > > > session A=# BEGIN; > > session A=# ALTER TABLE c1 NO INHERIT p; > > > > session B=# EXPLAIN ANALYZE SELECT * FROM p; > > (blocked) > > > > session A=# COMMIT; > > > > session B: ERROR: could not find inherited attribute "a" of relation "c1" > > > > This happens at least back to 9.1 to master and doesn't seem to > > be a designed behavior. > > I recall I had proposed a fix for the same thing some time ago [1]. Wow. About 1.5 years ago and stuck? I have a concrete case where this harms so I'd like to fix that this time. How can we move on? > > The cause is that NO INHERIT doesn't take an exlusive lock on the > > parent. This allows expand_inherited_rtentry to add the child > > relation into appendrel after removal from the inheritance but > > still exists. > > Right. > > > I see two ways to fix this. > > > > The first patch adds a recheck of inheritance relationship if the > > corresponding attribute is missing in the child in > > make_inh_translation_list(). The recheck is a bit complex but it > > is not performed unless the sequence above is happen. It checks > > duplication of relid (or cycles in inheritance) following > > find_all_inheritors (but doing a bit different) but I'm not sure > > it is really useful. > > I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but > I guess your hash table based solution will do the job as far as > performance of this check is concerned, although I haven't checked the > code closely. The hash table is not crucial in the patch. Substantially it just recurs down looking up pg_inherits for the child. I considered the new index but abandoned because I thought that such case won't occur so frequently. > > The second patch lets ALTER TABLE NO INHERIT to acquire locks on > > the parent first. > > > > Since the latter has a larger impact on the current behavior and > > we already treat "DROP TABLE child" case in the similar way, I > > suppose that the first approach would be preferable. > > That makes sense. > > BTW, in the partitioned table case, the parent is always locked first > using an AccessExclusiveLock. There are other considerations in that case > such as needing to recreate the partition descriptor upon termination of > inheritance (both the DETACH PARTITION and also DROP TABLE child cases). Apart from the degree of concurrency, if we keep parent->children order of locking, such recreation does not seem to be needed. Maybe I'm missing something. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote: > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote: >> >> I recall I had proposed a fix for the same thing some time ago [1]. > > Wow. About 1.5 years ago and stuck? I have a concrete case where > this harms so I'd like to fix that this time. How can we move on? Agreed that this should be fixed. Your proposed approach #1 to recheck the inheritance after obtaining the lock on the child table seems to be a good way forward. Approach #2 of reordering locking is a simpler solution, but does not completely prevent the problem, because DROP TABLE child can still cause it to occur, as you mentioned. >>> The cause is that NO INHERIT doesn't take an exlusive lock on the >>> parent. This allows expand_inherited_rtentry to add the child >>> relation into appendrel after removal from the inheritance but >>> still exists. >> >> Right. >> >>> I see two ways to fix this. >>> >>> The first patch adds a recheck of inheritance relationship if the >>> corresponding attribute is missing in the child in >>> make_inh_translation_list(). The recheck is a bit complex but it >>> is not performed unless the sequence above is happen. It checks >>> duplication of relid (or cycles in inheritance) following >>> find_all_inheritors (but doing a bit different) but I'm not sure >>> it is really useful. >> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but >> I guess your hash table based solution will do the job as far as >> performance of this check is concerned, although I haven't checked the >> code closely. > > The hash table is not crucial in the patch. Substantially it just > recurs down looking up pg_inherits for the child. I considered > the new index but abandoned because I thought that such case > won't occur so frequently. Agreed. BTW, the hash table in patch #1 does not seem to be really helpful. In the while loop in is_descendant_of_internal(), does hash_search() ever return found = true? AFAICS, it does not. >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on >>> the parent first. >>> >>> Since the latter has a larger impact on the current behavior and >>> we already treat "DROP TABLE child" case in the similar way, I >>> suppose that the first approach would be preferable. >> >> That makes sense. >> >> BTW, in the partitioned table case, the parent is always locked first >> using an AccessExclusiveLock. There are other considerations in that case >> such as needing to recreate the partition descriptor upon termination of >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > > Apart from the degree of concurrency, if we keep parent->children > order of locking, such recreation does not seem to be > needed. Maybe I'm missing something. Sorry to have introduced that topic in this thread, but I will try to explain anyway why things are the way they are currently: Once a table is no longer a partition of the parent (detached or dropped), we must make sure that the next commands in the transaction don't see it as one. That information is currently present in the relcache (rd_partdesc), which is used by a few callers, most notably the tuple-routing code. Next commands must recreate the entry so that the correct thing happens based on the updated information. More precisely, we must invalidate the current entry. RelationClearRelation() will either delete the entry or rebuild it. If it's being referenced somewhere, it will be rebuilt. The place holding the reference may also be looking at the content of rd_partdesc, which we don't require them to make a copy of, so we must preserve its content while other fields of RelationData are being read anew from the catalog. We don't have to preserve it if there has been any change (partition added/dropped), but to make such a change one would need to take a strong enough lock on the relation (parent). We assume here that anyone who wants to reference rd_partdesc takes at least AccessShareLock lock on the relation, and anyone who wants to change its content must take a lock that will conflict with it, so AccessExclusiveLock. Note that in all of this, we are only talking about one relation, that is the parent, so parent -> child ordering of taking locks may be irrelevant. Thanks, Amit
Hello, I'll add this to CF2017-09. At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <75fe42df-b1d8-89ff-596d-d9da0749e66d@lab.ntt.co.jp> > On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote: > > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote: > >> > >> I recall I had proposed a fix for the same thing some time ago [1]. > > > > Wow. About 1.5 years ago and stuck? I have a concrete case where > > this harms so I'd like to fix that this time. How can we move on? > > Agreed that this should be fixed. > > Your proposed approach #1 to recheck the inheritance after obtaining the > lock on the child table seems to be a good way forward. > > Approach #2 of reordering locking is a simpler solution, but does not > completely prevent the problem, because DROP TABLE child can still cause > it to occur, as you mentioned. > > >>> The cause is that NO INHERIT doesn't take an exlusive lock on the > >>> parent. This allows expand_inherited_rtentry to add the child > >>> relation into appendrel after removal from the inheritance but > >>> still exists. > >> > >> Right. > >> > >>> I see two ways to fix this. > >>> > >>> The first patch adds a recheck of inheritance relationship if the > >>> corresponding attribute is missing in the child in > >>> make_inh_translation_list(). The recheck is a bit complex but it > >>> is not performed unless the sequence above is happen. It checks > >>> duplication of relid (or cycles in inheritance) following > >>> find_all_inheritors (but doing a bit different) but I'm not sure > >>> it is really useful. > >> > >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but > >> I guess your hash table based solution will do the job as far as > >> performance of this check is concerned, although I haven't checked the > >> code closely. > > > > The hash table is not crucial in the patch. Substantially it just > > recurs down looking up pg_inherits for the child. I considered > > the new index but abandoned because I thought that such case > > won't occur so frequently. > > Agreed. BTW, the hash table in patch #1 does not seem to be really > helpful. In the while loop in is_descendant_of_internal(), does > hash_search() ever return found = true? AFAICS, it does not. > > >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on > >>> the parent first. > >>> > >>> Since the latter has a larger impact on the current behavior and > >>> we already treat "DROP TABLE child" case in the similar way, I > >>> suppose that the first approach would be preferable. > >> > >> That makes sense. So, I attached only the first patch, rebased on the current master (It actually failed to apply on it.) and fixed a typo in a comment. This still runs a closed-loop test using temporary hash but it seems a bit paranoic. (this is the same check with what find_all_inheritors is doing) << the following is another topic >> > >> BTW, in the partitioned table case, the parent is always locked first > >> using an AccessExclusiveLock. There are other considerations in that case > >> such as needing to recreate the partition descriptor upon termination of > >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > > > > Apart from the degree of concurrency, if we keep parent->children > > order of locking, such recreation does not seem to be > > needed. Maybe I'm missing something. > > Sorry to have introduced that topic in this thread, but I will try to > explain anyway why things are the way they are currently: > > Once a table is no longer a partition of the parent (detached or dropped), > we must make sure that the next commands in the transaction don't see it > as one. That information is currently present in the relcache > (rd_partdesc), which is used by a few callers, most notably the > tuple-routing code. Next commands must recreate the entry so that the > correct thing happens based on the updated information. More precisely, > we must invalidate the current entry. RelationClearRelation() will either > delete the entry or rebuild it. If it's being referenced somewhere, it > will be rebuilt. The place holding the reference may also be looking at > the content of rd_partdesc, which we don't require them to make a copy of, > so we must preserve its content while other fields of RelationData are > being read anew from the catalog. We don't have to preserve it if there > has been any change (partition added/dropped), but to make such a change > one would need to take a strong enough lock on the relation (parent). We > assume here that anyone who wants to reference rd_partdesc takes at least > AccessShareLock lock on the relation, and anyone who wants to change its > content must take a lock that will conflict with it, so > AccessExclusiveLock. Note that in all of this, we are only talking about > one relation, that is the parent, so parent -> child ordering of taking > locks may be irrelevant. I think I understand this, anyway DropInherit and DropPartition is different-but-at-the-same-level operations so surely needs amendment for drop/detach cases. Is there already a solution? Or reproducing steps? regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/backend/catalog/pg_inherits.c --- b/src/backend/catalog/pg_inherits.c *************** *** 42,47 **** typedef struct SeenRelsEntry --- 42,49 ---- ListCell *numparents_cell; /* corresponding list cell */ } SeenRelsEntry; + static bool is_descendent_of_internal(Oid parentId, Oid childId, + HTAB *seen_rels); /* * find_inheritance_children * *************** *** 400,402 **** typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId) --- 402,472 ---- return result; } + + /* + * Check if the child is really a descendent of the parent + */ + bool + is_descendent_of(Oid parentId, Oid childId) + { + HTAB *seen_rels; + HASHCTL ctl; + bool ischild = false; + + memset(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(Oid); + ctl.entrysize = sizeof(Oid); + ctl.hcxt = CurrentMemoryContext; + + seen_rels = hash_create("is_descendent_of temporary table", + 32, /* start small and extend */ + &ctl, + HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); + + ischild = is_descendent_of_internal(parentId, childId, seen_rels); + + hash_destroy(seen_rels); + + return ischild; + } + + static bool + is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels) + { + Relation inhrel; + SysScanDesc scan; + ScanKeyData key[1]; + bool ischild = false; + HeapTuple inheritsTuple; + + inhrel = heap_open(InheritsRelationId, AccessShareLock); + ScanKeyInit(&key[0], Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(parentId)); + scan = systable_beginscan(inhrel, InheritsParentIndexId, true, + NULL, 1, key); + + while ((inheritsTuple = systable_getnext(scan)) != NULL) + { + bool found; + Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid; + + hash_search(seen_rels, &inhrelid, HASH_ENTER, &found); + + /* + * Recursively check into children. Although there can't theoretically + * be any cycles in the inheritance graph, check the cycles following + * find_all_inheritors. + */ + if (inhrelid == childId || + (!found && is_descendent_of_internal(inhrelid, childId, seen_rels))) + { + ischild = true; + break; + } + } + + systable_endscan(scan); + heap_close(inhrel, AccessShareLock); + + return ischild; + } *** a/src/backend/optimizer/prep/prepunion.c --- b/src/backend/optimizer/prep/prepunion.c *************** *** 100,109 **** static List *generate_append_tlist(List *colTypes, List *colCollations, static List *generate_setop_grouplist(SetOperationStmt*op, List *targetlist); static void expand_inherited_rtentry(PlannerInfo *root,RangeTblEntry *rte, Index rti); ! static void make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno, ! List **translated_vars); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, --- 100,108 ---- static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); static void expand_inherited_rtentry(PlannerInfo*root, RangeTblEntry *rte, Index rti); ! static List *make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, *************** *** 1508,1513 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) --- 1507,1527 ---- */ if (childrte->relkind != RELKIND_PARTITIONED_TABLE) { + List *translated_vars = + make_inh_translation_list(oldrelation, newrelation, + childRTindex); + + if (!translated_vars) + { + /* + * This childrel is no longer a child of the parent. + * Close the child relation releasing locks. + */ + if (childOID != parentOID) + heap_close(newrelation, lockmode); + continue; + } + /* Remember if we saw a real child. */ if (childOID != parentOID) has_child =true; *************** *** 1517,1524 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) appinfo->child_relid= childRTindex; appinfo->parent_reltype = oldrelation->rd_rel->reltype; appinfo->child_reltype= newrelation->rd_rel->reltype; ! make_inh_translation_list(oldrelation, newrelation, childRTindex, ! &appinfo->translated_vars); appinfo->parent_reloid = parentOID; appinfos = lappend(appinfos, appinfo); --- 1531,1537 ---- appinfo->child_relid = childRTindex; appinfo->parent_reltype = oldrelation->rd_rel->reltype; appinfo->child_reltype = newrelation->rd_rel->reltype; ! appinfo->translated_vars = translated_vars; appinfo->parent_reloid = parentOID; appinfos= lappend(appinfos, appinfo); *************** *** 1623,1636 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) /* * make_inh_translation_list * Build the list of translations from parent Vars to child Vars for ! * an inheritance child. * * For paranoia's sake, we match type/collation as well as attribute name. */ ! static void make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno, ! List **translated_vars) { List *vars = NIL; TupleDesc old_tupdesc = RelationGetDescr(oldrelation); --- 1636,1649 ---- /* * make_inh_translation_list * Build the list of translations from parent Vars to child Varsfor ! * an inheritance child. Returns NULL if the two relations are no longer in ! * a inheritance relationship. * * For paranoia's sake, we match type/collation as well as attribute name. */ ! static List * make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno) { List *vars = NIL; TupleDesc old_tupdesc = RelationGetDescr(oldrelation); *************** *** 1700,1707 **** make_inh_translation_list(Relation oldrelation, Relation newrelation, --- 1713,1730 ---- break; } if (new_attno >= newnatts) + { + /* + * It's possible that newrelation is no longer a child of the + * oldrelation. We just ingore it for the case. + */ + if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id)) + return NULL; + + /* If still a child, something's wrong. */ elog(ERROR, "could not find inherited attribute\"%s\" of relation \"%s\"", attname, RelationGetRelationName(newrelation)); + } } /* Found it, check type and collation match */ *************** *** 1720,1726 **** make_inh_translation_list(Relation oldrelation, Relation newrelation, 0)); } ! *translated_vars = vars; } /* --- 1743,1749 ---- 0)); } ! return vars; } /* *** a/src/include/catalog/pg_inherits_fn.h --- b/src/include/catalog/pg_inherits_fn.h *************** *** 23,27 **** extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, --- 23,28 ---- extern bool has_subclass(Oid relationId); extern bool has_superclass(Oid relationId); extern bool typeInheritsFrom(OidsubclassTypeId, Oid superclassTypeId); + extern bool is_descendent_of(Oid parentId, Oid childId); #endif /* PG_INHERITS_FN_H */
At Mon, 28 Aug 2017 18:28:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170828.182807.98097766.horiguchi.kyotaro@lab.ntt.co.jp> > I'll add this to CF2017-09. This patch got deadly atack from the commit 30833ba. I changed the signature of expand_single_inheritance_child in addition to make_inh_translation_list to notify that the specified child is no longer a child of the parent. This passes regular regression test and fixed the the problem. regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/backend/catalog/pg_inherits.c --- b/src/backend/catalog/pg_inherits.c *************** *** 42,47 **** typedef struct SeenRelsEntry --- 42,49 ---- ListCell *numparents_cell; /* corresponding list cell */ } SeenRelsEntry; + static bool is_descendent_of_internal(Oid parentId, Oid childId, + HTAB *seen_rels); /* * find_inheritance_children * *************** *** 400,402 **** typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId) --- 402,472 ---- return result; } + + /* + * Check if the child is really a descendent of the parent + */ + bool + is_descendent_of(Oid parentId, Oid childId) + { + HTAB *seen_rels; + HASHCTL ctl; + bool ischild = false; + + memset(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(Oid); + ctl.entrysize = sizeof(Oid); + ctl.hcxt = CurrentMemoryContext; + + seen_rels = hash_create("is_descendent_of temporary table", + 32, /* start small and extend */ + &ctl, + HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); + + ischild = is_descendent_of_internal(parentId, childId, seen_rels); + + hash_destroy(seen_rels); + + return ischild; + } + + static bool + is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels) + { + Relation inhrel; + SysScanDesc scan; + ScanKeyData key[1]; + bool ischild = false; + HeapTuple inheritsTuple; + + inhrel = heap_open(InheritsRelationId, AccessShareLock); + ScanKeyInit(&key[0], Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(parentId)); + scan = systable_beginscan(inhrel, InheritsParentIndexId, true, + NULL, 1, key); + + while ((inheritsTuple = systable_getnext(scan)) != NULL) + { + bool found; + Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid; + + hash_search(seen_rels, &inhrelid, HASH_ENTER, &found); + + /* + * Recursively check into children. Although there can't theoretically + * be any cycles in the inheritance graph, check the cycles following + * find_all_inheritors. + */ + if (inhrelid == childId || + (!found && is_descendent_of_internal(inhrelid, childId, seen_rels))) + { + ischild = true; + break; + } + } + + systable_endscan(scan); + heap_close(inhrel, AccessShareLock); + + return ischild; + } *** a/src/backend/optimizer/prep/prepunion.c --- b/src/backend/optimizer/prep/prepunion.c *************** *** 108,123 **** static void expand_partitioned_rtentry(PlannerInfo *root, LOCKMODE lockmode, bool *has_child, List **appinfos, List **partitioned_child_rels); ! static void expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, Index parentRTindex, Relation parentrel, PlanRowMark*parentrc, Relation childrel, bool *has_child, List **appinfos, List **partitioned_child_rels); ! static void make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno, ! List **translated_vars); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, --- 108,122 ---- LOCKMODE lockmode, bool *has_child, List **appinfos, List **partitioned_child_rels); ! static bool expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, Index parentRTindex, Relation parentrel, PlanRowMark*parentrc, Relation childrel, bool *has_child, List **appinfos, List **partitioned_child_rels); ! static List *make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, *************** *** 1476,1481 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) --- 1475,1482 ---- * in which they appear in the PartitionDesc. But first, expand the * parent itself. */ + + /* ignore the return value since this doesn't exclude the parent */ expand_single_inheritance_child(root,rte, rti, oldrelation, oldrc, oldrelation, &has_child, &appinfos, *************** *** 1497,1502 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) --- 1498,1504 ---- { Oid childOID = lfirst_oid(l); Relation newrelation; + bool expanded = false; /* Open rel if needed; we already have required locks */ if (childOID != parentOID) *************** *** 1516,1529 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) continue; } ! expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc, newrelation, &has_child, &appinfos, &partitioned_child_rels); ! /* Close child relations, but keep locks */ if (childOID != parentOID) ! heap_close(newrelation, NoLock); } } --- 1518,1532 ---- continue; } ! expanded = expand_single_inheritance_child(root, ! rte, rti, oldrelation, oldrc, newrelation, &has_child, &appinfos, &partitioned_child_rels); ! /* Close child relations, but keep locks if expanded */ if (childOID != parentOID) ! heap_close(newrelation, expanded ? NoLock : lockmode); } } *************** *** 1581,1586 **** expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, --- 1584,1590 ---- { Oid childOID = partdesc->oids[i]; Relation childrel; + bool expanded = false; /* Open rel; we already have required locks */ childrel = heap_open(childOID,NoLock); *************** *** 1592,1598 **** expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, continue; } ! expand_single_inheritance_child(root, parentrte, parentRTindex, parentrel,parentrc, childrel, has_child, appinfos, partitioned_child_rels); --- 1596,1603 ---- continue; } ! expanded = expand_single_inheritance_child(root, ! parentrte, parentRTindex, parentrel, parentrc,childrel, has_child, appinfos, partitioned_child_rels); *************** *** 1606,1613 **** expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, has_child, appinfos, partitioned_child_rels); ! /* Close child relation, but keep locks */ ! heap_close(childrel, NoLock); } } --- 1611,1618 ---- has_child, appinfos, partitioned_child_rels); ! /* Close child relation, but keep locks if expanded */ ! heap_close(childrel, expanded ? NoLock : lockmode); } } *************** *** 1619,1626 **** expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, * anything at all. Otherwise,we'll set "has_child" to true, build a * RangeTblEntry and either a PartitionedChildRelInfo or AppendRelInfo as * appropriate, plus maybe a PlanRowMark. */ ! static void expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, Index parentRTindex, Relation parentrel, PlanRowMark *parentrc, Relation childrel, --- 1624,1634 ---- * anything at all. Otherwise, we'll set "has_child" to true, build a * RangeTblEntry and either a PartitionedChildRelInfoor AppendRelInfo as * appropriate, plus maybe a PlanRowMark. + * + * Returns false if the child has been found that no longer a child of the + * parent. */ ! static bool expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, Index parentRTindex, Relation parentrel, PlanRowMark *parentrc, Relation childrel, *************** *** 1661,1666 **** expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, --- 1669,1692 ---- */ if (childrte->relkind != RELKIND_PARTITIONED_TABLE) { + List *translated_vars = + make_inh_translation_list(parentrel, childrel, childRTindex); + + /* + * It may happen that the childOID is no longer a child of the + * parent. + */ + if (!translated_vars) + { + /* + * This childrel is no longer a child of the parent. Return doing + * nothing. Halfway built childrte is abandoned but this happens + * really rarely. + */ + Assert (childOID != parentOID); + return false; + } + /* Remember if we saw a real child. */ if (childOID != parentOID) *has_child = true; *************** *** 1670,1677 **** expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, appinfo->child_relid= childRTindex; appinfo->parent_reltype = parentrel->rd_rel->reltype; appinfo->child_reltype= childrel->rd_rel->reltype; ! make_inh_translation_list(parentrel, childrel, childRTindex, ! &appinfo->translated_vars); appinfo->parent_reloid = parentOID; *appinfos= lappend(*appinfos, appinfo); --- 1696,1702 ---- appinfo->child_relid = childRTindex; appinfo->parent_reltype = parentrel->rd_rel->reltype; appinfo->child_reltype = childrel->rd_rel->reltype; ! appinfo->translated_vars = translated_vars; appinfo->parent_reloid = parentOID; *appinfos = lappend(*appinfos,appinfo); *************** *** 1726,1744 **** expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, root->rowMarks =lappend(root->rowMarks, childrc); } } /* * make_inh_translation_list * Build the list of translations from parentVars to child Vars for ! * an inheritance child. * * For paranoia's sake, we match type/collation as well as attribute name. */ ! static void make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno, ! List **translated_vars) { List *vars = NIL; TupleDesc old_tupdesc = RelationGetDescr(oldrelation); --- 1751,1771 ---- root->rowMarks = lappend(root->rowMarks, childrc); } + + return true; } /* * make_inh_translation_list * Build the list of translations from parent Vars to child Varsfor ! * an inheritance child. Returns NULL if the two relations are no longer in ! * a inheritance relationship. * * For paranoia's sake, we match type/collation as well as attribute name. */ ! static List * make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno) { List *vars = NIL; TupleDesc old_tupdesc = RelationGetDescr(oldrelation); *************** *** 1808,1815 **** make_inh_translation_list(Relation oldrelation, Relation newrelation, --- 1835,1852 ---- break; } if (new_attno >= newnatts) + { + /* + * It's possible that newrelation is no longer a child of the + * oldrelation. We just ingore it for the case. + */ + if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id)) + return NULL; + + /* If still a child, something's wrong. */ elog(ERROR, "could not find inherited attribute\"%s\" of relation \"%s\"", attname, RelationGetRelationName(newrelation)); + } } /* Found it, check type and collation match */ *************** *** 1828,1834 **** make_inh_translation_list(Relation oldrelation, Relation newrelation, 0)); } ! *translated_vars = vars; } /* --- 1865,1871 ---- 0)); } ! return vars; } /* *** a/src/include/catalog/pg_inherits_fn.h --- b/src/include/catalog/pg_inherits_fn.h *************** *** 23,27 **** extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, --- 23,28 ---- extern bool has_subclass(Oid relationId); extern bool has_superclass(Oid relationId); extern bool typeInheritsFrom(OidsubclassTypeId, Oid superclassTypeId); + extern bool is_descendent_of(Oid parentId, Oid childId); #endif /* PG_INHERITS_FN_H */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26 June 2017 at 10:16, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > BTW, in the partitioned table case, the parent is always locked first > using an AccessExclusiveLock. There are other considerations in that case > such as needing to recreate the partition descriptor upon termination of > inheritance (both the DETACH PARTITION and also DROP TABLE child cases). Is this requirement documented or in comments anywhere? I can't see anything about that, which is a fairly major usage point. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/13 12:05, Simon Riggs wrote: > On 26 June 2017 at 10:16, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> BTW, in the partitioned table case, the parent is always locked first >> using an AccessExclusiveLock. There are other considerations in that case >> such as needing to recreate the partition descriptor upon termination of >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > > Is this requirement documented or in comments anywhere? Yes. See the last sentence in the description of PARTITION OF clause in CREATE TABLE: https://www.postgresql.org/docs/devel/static/sql-createtable.html#sql-createtable-partition And, the 4th point in the list of differences between declarative partitioning and inheritance: https://www.postgresql.org/docs/devel/static/ddl-partitioning.html#ddl-partitioning-implementation-inheritance Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > The cause is that NO INHERIT doesn't take an exlusive lock on the > parent. This allows expand_inherited_rtentry to add the child > relation into appendrel after removal from the inheritance but > still exists. > > I see two ways to fix this. > > The first patch adds a recheck of inheritance relationship if the > corresponding attribute is missing in the child in > make_inh_translation_list(). The recheck is a bit complex but it > is not performed unless the sequence above is happen. It checks > duplication of relid (or cycles in inheritance) following > find_all_inheritors (but doing a bit different) but I'm not sure > it is really useful. > > > The second patch lets ALTER TABLE NO INHERIT to acquire locks on > the parent first. > > Since the latter has a larger impact on the current behavior and > we already treat "DROP TABLE child" case in the similar way, I > suppose that the first approach would be preferable. > > > Any comments or thoughts? I agree that the second has less user impact, but I wonder if we can think that will really fix the bug completely, or more generally if it will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not locking the parent. I have a sneaking suspicion that may be wishful thinking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
At Wed, 13 Sep 2017 20:20:48 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZnKgN1dCQaBwHn1aDVPyAfQ8PGRmyUH22hxy39+Aqawg@mail.gmail.com> > On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > The cause is that NO INHERIT doesn't take an exlusive lock on the > > parent. This allows expand_inherited_rtentry to add the child > > relation into appendrel after removal from the inheritance but > > still exists. > > > > I see two ways to fix this. > > > > The first patch adds a recheck of inheritance relationship if the > > corresponding attribute is missing in the child in > > make_inh_translation_list(). The recheck is a bit complex but it > > is not performed unless the sequence above is happen. It checks > > duplication of relid (or cycles in inheritance) following > > find_all_inheritors (but doing a bit different) but I'm not sure > > it is really useful. > > > > > > The second patch lets ALTER TABLE NO INHERIT to acquire locks on > > the parent first. > > > > Since the latter has a larger impact on the current behavior and > > we already treat "DROP TABLE child" case in the similar way, I > > suppose that the first approach would be preferable. > > > > > > Any comments or thoughts? > > I agree that the second has less user impact, but I wonder if we can > think that will really fix the bug completely, or more generally if it > will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not > locking the parent. I have a sneaking suspicion that may be wishful > thinking. Thanks for the comment. The recheck patch prevent planner from involving just-detached children while inheritance expansion. No simultaneous detatching of children doesn't affect the planning before the time. Once planner (the select side) gets lock on the child, the alter side cannot do anything until the select finishes. If the alter side won, the select side detects detaching immediately after the lock is released then excludes the children. No problem will occur ever after. Even in the case a child is replaced with another table, it is nothing different from simple detaching. As the result, I think that the recheck patch saves all possible problem caused by simultaneously detached children. However, the parent-locking patch is far smaller and it doesn't need such an explanation on its perfectness. If another problem occurs by simlultaneous detaching, it must haven't taken necessary locks in the right order. The most significant reason for my decision on this ptach was the fact that the DROP child case have been resolved by rechecking without parent locks, which is a kind of waste of resources if it could be resolved without it. (And I saw that the same solution is taken at least several places) I don't think there's noticeable difference of behavior (excluding performance) for users between the two solutions. Is there anyone who has an opinion on the point? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi. On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote: > << the following is another topic >> > >>>> BTW, in the partitioned table case, the parent is always locked first >>>> using an AccessExclusiveLock. There are other considerations in that case >>>> such as needing to recreate the partition descriptor upon termination of >>>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). >>> >>> Apart from the degree of concurrency, if we keep parent->children >>> order of locking, such recreation does not seem to be >>> needed. Maybe I'm missing something. >> >> Sorry to have introduced that topic in this thread, but I will try to >> explain anyway why things are the way they are currently: >> >> Once a table is no longer a partition of the parent (detached or dropped), >> we must make sure that the next commands in the transaction don't see it >> as one. That information is currently present in the relcache >> (rd_partdesc), which is used by a few callers, most notably the >> tuple-routing code. Next commands must recreate the entry so that the >> correct thing happens based on the updated information. More precisely, >> we must invalidate the current entry. RelationClearRelation() will either >> delete the entry or rebuild it. If it's being referenced somewhere, it >> will be rebuilt. The place holding the reference may also be looking at >> the content of rd_partdesc, which we don't require them to make a copy of, >> so we must preserve its content while other fields of RelationData are >> being read anew from the catalog. We don't have to preserve it if there >> has been any change (partition added/dropped), but to make such a change >> one would need to take a strong enough lock on the relation (parent). We >> assume here that anyone who wants to reference rd_partdesc takes at least >> AccessShareLock lock on the relation, and anyone who wants to change its >> content must take a lock that will conflict with it, so >> AccessExclusiveLock. Note that in all of this, we are only talking about >> one relation, that is the parent, so parent -> child ordering of taking >> locks may be irrelevant. > > I think I understand this, anyway DropInherit and DropPartition > is different-but-at-the-same-level operations so surely needs > amendment for drop/detach cases. Is there already a solution? Or > reproducing steps? Sorry, I think I forgot to reply to this. Since you seem to have chosen the other solution (checking that child is still a child), maybe this reply is a bit too late, but anyway. DropInherit or NO INHERIT is seen primarily as changing a child table's (which is the target table of the command) property that it is no longer a child of the parent, so we lock the child table to block concurrent operations from considering it a child of parent anymore. The fact that parent is locked after the child and with ShareUpdateExclusiveLock instead of AccessExclusiveLock, we observe this race condition when SELECTing from the parent. DropPartition or DETACH PARTITION is seen primarily as changing the parent table's (which is the target table of the command) property that one of the partitions is removed, so we lock the parent. Any concurrent operations that rely on the parent's relcache to get the partition list will wait for the session that is dropping the partition to finish, so that they get the fresh information from the relcache (or more importantly do not end up with information obtained from the relcache going invalid under them without notice). Note that the lock on the partition/child is also present and it plays more or less the the same role as it does in the DropInherit case, but due to different order of locking, reported race condition does not occur between SELECT on partitioned table and DROP/DETACH PARTITION. By the way, I will take a look at your patch when I come back from the vacation. Meanwhile, I noticed that it needs another rebase after 0a480502b092 [1]. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/15 15:36, Amit Langote wrote: > The fact that > parent is locked after the child and with ShareUpdateExclusiveLock instead > of AccessExclusiveLock, we observe this race condition when SELECTing from > the parent. Oops, I meant "parent table is locked with AccessShareLock instead of AccessExclusiveLock..." Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
At Fri, 15 Sep 2017 15:36:26 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <d88a9a64-e307-59b5-b4c3-8d7fc11cb59d@lab.ntt.co.jp> > Hi. > > On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote: > > << the following is another topic >> > > > >>>> BTW, in the partitioned table case, the parent is always locked first > >>>> using an AccessExclusiveLock. There are other considerations in that case > >>>> such as needing to recreate the partition descriptor upon termination of > >>>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > >>> > >>> Apart from the degree of concurrency, if we keep parent->children > >>> order of locking, such recreation does not seem to be > >>> needed. Maybe I'm missing something. > >> > >> Sorry to have introduced that topic in this thread, but I will try to > >> explain anyway why things are the way they are currently: > >> > >> Once a table is no longer a partition of the parent (detached or dropped), > >> we must make sure that the next commands in the transaction don't see it > >> as one. That information is currently present in the relcache > >> (rd_partdesc), which is used by a few callers, most notably the > >> tuple-routing code. Next commands must recreate the entry so that the > >> correct thing happens based on the updated information. More precisely, > >> we must invalidate the current entry. RelationClearRelation() will either > >> delete the entry or rebuild it. If it's being referenced somewhere, it > >> will be rebuilt. The place holding the reference may also be looking at > >> the content of rd_partdesc, which we don't require them to make a copy of, > >> so we must preserve its content while other fields of RelationData are > >> being read anew from the catalog. We don't have to preserve it if there > >> has been any change (partition added/dropped), but to make such a change > >> one would need to take a strong enough lock on the relation (parent). We > >> assume here that anyone who wants to reference rd_partdesc takes at least > >> AccessShareLock lock on the relation, and anyone who wants to change its > >> content must take a lock that will conflict with it, so > >> AccessExclusiveLock. Note that in all of this, we are only talking about > >> one relation, that is the parent, so parent -> child ordering of taking > >> locks may be irrelevant. > > > > I think I understand this, anyway DropInherit and DropPartition > > is different-but-at-the-same-level operations so surely needs > > amendment for drop/detach cases. Is there already a solution? Or > > reproducing steps? > > Sorry, I think I forgot to reply to this. Since you seem to have chosen > the other solution (checking that child is still a child), maybe this > reply is a bit too late, but anyway. I choosed it at that time for the reason mentioned upthread, but haven't decided which is better. > DropInherit or NO INHERIT is seen primarily as changing a child table's > (which is the target table of the command) property that it is no longer a > child of the parent, so we lock the child table to block concurrent > operations from considering it a child of parent anymore. The fact that > parent is locked after the child and with ShareUpdateExclusiveLock instead > of AccessExclusiveLock, we observe this race condition when SELECTing from > the parent. > > DropPartition or DETACH PARTITION is seen primarily as changing the parent > table's (which is the target table of the command) property that one of > the partitions is removed, so we lock the parent. Any concurrent > operations that rely on the parent's relcache to get the partition list > will wait for the session that is dropping the partition to finish, so > that they get the fresh information from the relcache (or more importantly > do not end up with information obtained from the relcache going invalid > under them without notice). Note that the lock on the partition/child is > also present and it plays more or less the the same role as it does in the > DropInherit case, but due to different order of locking, reported race > condition does not occur between SELECT on partitioned table and > DROP/DETACH PARTITION. Thank you for the explanation. I understand that the difference comes from which of parent and children has the information about inheritance/partitioning. DROP child and ALTER child NO INHERITS are (I think) the only two operations that intiated from children side. The parent-locking patch results in different solutions for similar problems. However, parent locking on DROPped child requires a new index InheritsRelidIndex to avoid full scan on pg_inherits, but the NO INHERITS case doesn't. This might be a good reason for the difference. I noticed that DROP TABLE partition acquires lock on the parent in a callback for RangeVarGetRelid(Extended). The same way seems reasonable for the NO INHERIT case. As the result the patch becomes far small and less invasive than the previous one. (dropinh_lock_parent_v2.patch) As a negative PoC, attacched dropchild_lock_parent_PoC.patch only to show how the same method on DROP TABLE case looks. > By the way, I will take a look at your patch when I come back from the > vacation. Meanwhile, I noticed that it needs another rebase after > 0a480502b092 [1]. Great. Thanks. > Thanks, > Amit > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092 regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 563bcda..b7c87b9 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -13175,7 +13175,28 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, reltype = ((AlterObjectSchemaStmt*) stmt)->objectType; else if (IsA(stmt, AlterTableStmt)) - reltype = ((AlterTableStmt *) stmt)->relkind; + { + AlterTableStmt *alterstmt = (AlterTableStmt *)stmt; + ListCell *lc; + + reltype = alterstmt->relkind; + + foreach (lc, alterstmt->cmds) + { + AlterTableCmd *cmd = lfirst_node(AlterTableCmd, lc); + Assert(IsA(cmd, AlterTableCmd)); + + /* + * Though NO INHERIT doesn't modify the parent, concurrent + * expansion of inheritances will see a false child. We must + * acquire lock on the parent when dropping inheritance, but no + * need to ascend further. + */ + if (cmd->subtype == AT_DropInherit) + RangeVarGetRelid((RangeVar *)cmd->def, + AccessExclusiveLock, false); + } + } else { reltype = OBJECT_TABLE; /* placate compiler */ diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index 245a374..b83e248 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -126,19 +126,6 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) { /* Get the lock tosynchronize against concurrent drop */ LockRelationOid(inhrelid, lockmode); - - /* - * Now that we have the lock, double-check to see if the relation - * really exists or not. If not, assume it was dropped while we - * waited to acquire lock, and ignore it. - */ - if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(inhrelid))) - { - /* Release useless lock */ - UnlockRelationOid(inhrelid, lockmode); - /* And ignore this relation */ - continue; - } } list = lappend_oid(list, inhrelid); diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b7c87b9..62c8860 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1120,10 +1120,10 @@ RemoveRelations(DropStmt *drop)}/* - * Before acquiring a table lock, check whether we have sufficient rights. - * In the case of DROP INDEX, also try to lock the table before the index. - * Also, if the table to be dropped is a partition, we try to lock the parent - * first. + * Before acquiring a table lock, check whether we have sufficient rights. In + * the case of DROP INDEX, also try to lock the table before the index. Also, + * if the table to be dropped is a partition or inheritance children, we try + * to lock the parent first. */static voidRangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, @@ -1229,6 +1229,26 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, if (OidIsValid(state->partParentOid)) LockRelationOid(state->partParentOid, AccessExclusiveLock); } + + /* the same for all inheritance parents */ + if (relOid != oldRelOid) + { + Relation inhrel; + SysScanDesc scan; + HeapTuple inhtup; + + inhrel = heap_open(InheritsRelationId, AccessShareLock); + scan = systable_beginscan(inhrel, InvalidOid, false, NULL, 0, NULL); + while (HeapTupleIsValid(inhtup = systable_getnext(scan))) + { + Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inhtup); + + if (inh->inhrelid == relOid && OidIsValid(inh->inhparent)) + LockRelationOid(inh->inhparent, AccessExclusiveLock); + } + systable_endscan(scan); + heap_close(inhrel, AccessShareLock); + }}/* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 3:04 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> By the way, I will take a look at your patch when I come back from the >> vacation. Meanwhile, I noticed that it needs another rebase after >> 0a480502b092 [1]. Moved to CF 2018-01. -- Michael
Hello, At Wed, 29 Nov 2017 14:04:01 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqTt-mj4S8qoO_C9CaFAcV0J3vhgg4_Kw2U1wXvyEYB0bw@mail.gmail.com> > On Tue, Sep 19, 2017 at 3:04 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > >> By the way, I will take a look at your patch when I come back from the > >> vacation. Meanwhile, I noticed that it needs another rebase after > >> 0a480502b092 [1]. > > Moved to CF 2018-01. Thank you. (I'll do that by myself from the next CF) This is rebased patch and additional patch of isolation test for this problem. regards, -- Kyotaro Horiguchi NTT Open Source Software Center From 4336b15d2c0d95e7044746aa5c3ae622712e41b3 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Tue, 12 Dec 2017 17:06:53 +0900 Subject: [PATCH 1/2] Add isolation test --- src/test/isolation/expected/select-noinherit.out | 9 +++++++++ src/test/isolation/isolation_schedule | 1 + src/test/isolation/specs/select-noinherit.spec | 23 +++++++++++++++++++++++ 3 files changed, 33 insertions(+) create mode 100644 src/test/isolation/expected/select-noinherit.out create mode 100644 src/test/isolation/specs/select-noinherit.spec diff --git a/src/test/isolation/expected/select-noinherit.out b/src/test/isolation/expected/select-noinherit.out new file mode 100644 index 0000000..5885167 --- /dev/null +++ b/src/test/isolation/expected/select-noinherit.out @@ -0,0 +1,9 @@ +Parsed test spec with 2 sessions + +starting permutation: alt1 sel2 c1 +step alt1: ALTER TABLE c1 NO INHERIT p; +step sel2: SELECT * FROM p; <waiting ...> +step c1: COMMIT; +step sel2: <... completed> +a + diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index e41b916..6e04ea4 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -63,3 +63,4 @@ test: async-notify test: vacuum-reltuples test: timeouts test: vacuum-concurrent-drop +test: select-noinherit diff --git a/src/test/isolation/specs/select-noinherit.spec b/src/test/isolation/specs/select-noinherit.spec new file mode 100644 index 0000000..31662a9 --- /dev/null +++ b/src/test/isolation/specs/select-noinherit.spec @@ -0,0 +1,23 @@ +# SELECT and ALTER TABLE NO INHERIT test +# + +setup +{ + CREATE TABLE p (a integer); + CREATE TABLE c1 () INHERITS (p); +} + +teardown +{ + DROP TABLE p CASCADE; +} + +session "s1" +setup { BEGIN ISOLATION LEVEL READ COMMITTED; } +step "alt1" { ALTER TABLE c1 NO INHERIT p; } +step "c1" { COMMIT; } + +session "s2" +step "sel2" { SELECT * FROM p; } + +permutation "alt1" "sel2" "c1" -- 2.9.2 From c2793d9200948d693150a5bbeb3815e3b5404be2 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Tue, 12 Dec 2017 17:38:12 +0900 Subject: [PATCH 2/2] Lock parent on ALTER TABLE NO INHERIT NO INHERIT doesn't modify the parent at all but lock is required to avoid error caused when a concurrent access see a false child. --- src/backend/commands/tablecmds.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index d979ce2..a8d119f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -13246,7 +13246,28 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, reltype = ((AlterObjectSchemaStmt *) stmt)->objectType; else if (IsA(stmt, AlterTableStmt)) - reltype = ((AlterTableStmt *) stmt)->relkind; + { + AlterTableStmt *alterstmt = (AlterTableStmt *)stmt; + ListCell *lc; + + reltype = alterstmt->relkind; + + foreach (lc, alterstmt->cmds) + { + AlterTableCmd *cmd = lfirst_node(AlterTableCmd, lc); + Assert(IsA(cmd, AlterTableCmd)); + + /* + * Though NO INHERIT doesn't modify the parent, lock on the parent + * is necessary so that no concurrent expansion of inheritances + * sees a false child and ends with ERROR. But no need to ascend + * further. + */ + if (cmd->subtype == AT_DropInherit) + RangeVarGetRelid((RangeVar *)cmd->def, + AccessExclusiveLock, false); + } + } else { reltype = OBJECT_TABLE; /* placate compiler */ -- 2.9.2
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > [ 0002-Lock-parent-on-ALTER-TABLE-NO-INHERIT.patch ] I don't especially like any of the patches proposed on this thread. The one with rechecking inheritance seems expensive, duplicative, and complicated. The approach of taking a lock on the parent will create deadlocks in usage patterns that did not encounter such deadlocks before --- in particular, in the scenarios in which this behavior matters at all, the ALTER will deadlock against ordinary queries that lock the parent before the child. So I can't believe that anyone who's hitting the problem in the field will think the extra lock is an improvement. I think that there might be a much simpler solution to this, which is to just remove make_inh_translation_list's tests of attinhcount, as per attached. Those are really pretty redundant: since we are matching by column name, the unique index on pg_attribute already guarantees there is at most one matching column. I have a feeling that those tests are my fault and I put them in on the theory that they could save a few strcmp executions --- but if they're causing problems, we can certainly drop them. In any case they'd only save something meaningful if most of the child's columns are non-inherited, which doesn't seem like the way to bet. In this way, if the child gets past the initial check on whether it's been dropped, we'll still succeed in matching its columns, even if they are no longer marked as inherited. For me, this works fine with either the ALTER NO INHERIT case (c1 does get scanned, with no error) or the DROP TABLE case (c1 doesn't get scanned). Now, you can break it if you really try hard: make the concurrent transaction do both ALTER NO INHERIT and then DROP COLUMN. But that's not a scenario that's been complained of, so I don't want to add a ton of mechanism to fix it. regards, tom lane diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 95557d7..5c4d113 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -1832,7 +1832,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, */ if (old_attno < newnatts && (att = TupleDescAttr(new_tupdesc, old_attno)) != NULL && - !att->attisdropped && att->attinhcount != 0 && + !att->attisdropped && strcmp(attname, NameStr(att->attname)) == 0) new_attno = old_attno; else @@ -1840,7 +1840,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, for (new_attno = 0; new_attno < newnatts; new_attno++) { att = TupleDescAttr(new_tupdesc, new_attno); - if (!att->attisdropped && att->attinhcount != 0 && + if (!att->attisdropped && strcmp(attname, NameStr(att->attname)) == 0) break; }
I wrote: > I think that there might be a much simpler solution to this, which > is to just remove make_inh_translation_list's tests of attinhcount, > as per attached. I went ahead and pushed that, with an isolation test based on the one Horiguchi-san submitted but covering a few related cases. regards, tom lane
Hello, sorry for my late reply. At Wed, 10 Jan 2018 14:56:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <26017.1515614209@sss.pgh.pa.us> > I think that there might be a much simpler solution to this, which > is to just remove make_inh_translation_list's tests of attinhcount, > as per attached. Those are really pretty redundant: since we are > matching by column name, the unique index on pg_attribute already > guarantees there is at most one matching column. I have a feeling My thought were restricted to the same behavior as the drop case, but Tom's solution is also fine for me. I agree to the point that the colums with the same name in a inheritance tree are safely assumed to be in a inheritance relationship. (Assuming everything is consistent.) At Fri, 12 Jan 2018 15:52:08 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <15881.1515790328@sss.pgh.pa.us> > I wrote: > > I think that there might be a much simpler solution to this, which > > is to just remove make_inh_translation_list's tests of attinhcount, > > as per attached. > > I went ahead and pushed that, with an isolation test based on the > one Horiguchi-san submitted but covering a few related cases. Thank you for commiting it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center