Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT |
Date | |
Msg-id | 20170919.150430.252858141.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
|
List | pgsql-hackers |
At Fri, 15 Sep 2017 15:36:26 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <d88a9a64-e307-59b5-b4c3-8d7fc11cb59d@lab.ntt.co.jp> > Hi. > > On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote: > > << the following is another topic >> > > > >>>> BTW, in the partitioned table case, the parent is always locked first > >>>> using an AccessExclusiveLock. There are other considerations in that case > >>>> such as needing to recreate the partition descriptor upon termination of > >>>> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > >>> > >>> Apart from the degree of concurrency, if we keep parent->children > >>> order of locking, such recreation does not seem to be > >>> needed. Maybe I'm missing something. > >> > >> Sorry to have introduced that topic in this thread, but I will try to > >> explain anyway why things are the way they are currently: > >> > >> Once a table is no longer a partition of the parent (detached or dropped), > >> we must make sure that the next commands in the transaction don't see it > >> as one. That information is currently present in the relcache > >> (rd_partdesc), which is used by a few callers, most notably the > >> tuple-routing code. Next commands must recreate the entry so that the > >> correct thing happens based on the updated information. More precisely, > >> we must invalidate the current entry. RelationClearRelation() will either > >> delete the entry or rebuild it. If it's being referenced somewhere, it > >> will be rebuilt. The place holding the reference may also be looking at > >> the content of rd_partdesc, which we don't require them to make a copy of, > >> so we must preserve its content while other fields of RelationData are > >> being read anew from the catalog. We don't have to preserve it if there > >> has been any change (partition added/dropped), but to make such a change > >> one would need to take a strong enough lock on the relation (parent). We > >> assume here that anyone who wants to reference rd_partdesc takes at least > >> AccessShareLock lock on the relation, and anyone who wants to change its > >> content must take a lock that will conflict with it, so > >> AccessExclusiveLock. Note that in all of this, we are only talking about > >> one relation, that is the parent, so parent -> child ordering of taking > >> locks may be irrelevant. > > > > I think I understand this, anyway DropInherit and DropPartition > > is different-but-at-the-same-level operations so surely needs > > amendment for drop/detach cases. Is there already a solution? Or > > reproducing steps? > > Sorry, I think I forgot to reply to this. Since you seem to have chosen > the other solution (checking that child is still a child), maybe this > reply is a bit too late, but anyway. I choosed it at that time for the reason mentioned upthread, but haven't decided which is better. > DropInherit or NO INHERIT is seen primarily as changing a child table's > (which is the target table of the command) property that it is no longer a > child of the parent, so we lock the child table to block concurrent > operations from considering it a child of parent anymore. The fact that > parent is locked after the child and with ShareUpdateExclusiveLock instead > of AccessExclusiveLock, we observe this race condition when SELECTing from > the parent. > > DropPartition or DETACH PARTITION is seen primarily as changing the parent > table's (which is the target table of the command) property that one of > the partitions is removed, so we lock the parent. Any concurrent > operations that rely on the parent's relcache to get the partition list > will wait for the session that is dropping the partition to finish, so > that they get the fresh information from the relcache (or more importantly > do not end up with information obtained from the relcache going invalid > under them without notice). Note that the lock on the partition/child is > also present and it plays more or less the the same role as it does in the > DropInherit case, but due to different order of locking, reported race > condition does not occur between SELECT on partitioned table and > DROP/DETACH PARTITION. Thank you for the explanation. I understand that the difference comes from which of parent and children has the information about inheritance/partitioning. DROP child and ALTER child NO INHERITS are (I think) the only two operations that intiated from children side. The parent-locking patch results in different solutions for similar problems. However, parent locking on DROPped child requires a new index InheritsRelidIndex to avoid full scan on pg_inherits, but the NO INHERITS case doesn't. This might be a good reason for the difference. I noticed that DROP TABLE partition acquires lock on the parent in a callback for RangeVarGetRelid(Extended). The same way seems reasonable for the NO INHERIT case. As the result the patch becomes far small and less invasive than the previous one. (dropinh_lock_parent_v2.patch) As a negative PoC, attacched dropchild_lock_parent_PoC.patch only to show how the same method on DROP TABLE case looks. > By the way, I will take a look at your patch when I come back from the > vacation. Meanwhile, I noticed that it needs another rebase after > 0a480502b092 [1]. Great. Thanks. > Thanks, > Amit > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092 regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 563bcda..b7c87b9 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -13175,7 +13175,28 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, reltype = ((AlterObjectSchemaStmt*) stmt)->objectType; else if (IsA(stmt, AlterTableStmt)) - reltype = ((AlterTableStmt *) stmt)->relkind; + { + AlterTableStmt *alterstmt = (AlterTableStmt *)stmt; + ListCell *lc; + + reltype = alterstmt->relkind; + + foreach (lc, alterstmt->cmds) + { + AlterTableCmd *cmd = lfirst_node(AlterTableCmd, lc); + Assert(IsA(cmd, AlterTableCmd)); + + /* + * Though NO INHERIT doesn't modify the parent, concurrent + * expansion of inheritances will see a false child. We must + * acquire lock on the parent when dropping inheritance, but no + * need to ascend further. + */ + if (cmd->subtype == AT_DropInherit) + RangeVarGetRelid((RangeVar *)cmd->def, + AccessExclusiveLock, false); + } + } else { reltype = OBJECT_TABLE; /* placate compiler */ diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index 245a374..b83e248 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -126,19 +126,6 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) { /* Get the lock tosynchronize against concurrent drop */ LockRelationOid(inhrelid, lockmode); - - /* - * Now that we have the lock, double-check to see if the relation - * really exists or not. If not, assume it was dropped while we - * waited to acquire lock, and ignore it. - */ - if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(inhrelid))) - { - /* Release useless lock */ - UnlockRelationOid(inhrelid, lockmode); - /* And ignore this relation */ - continue; - } } list = lappend_oid(list, inhrelid); diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b7c87b9..62c8860 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1120,10 +1120,10 @@ RemoveRelations(DropStmt *drop)}/* - * Before acquiring a table lock, check whether we have sufficient rights. - * In the case of DROP INDEX, also try to lock the table before the index. - * Also, if the table to be dropped is a partition, we try to lock the parent - * first. + * Before acquiring a table lock, check whether we have sufficient rights. In + * the case of DROP INDEX, also try to lock the table before the index. Also, + * if the table to be dropped is a partition or inheritance children, we try + * to lock the parent first. */static voidRangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, @@ -1229,6 +1229,26 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, if (OidIsValid(state->partParentOid)) LockRelationOid(state->partParentOid, AccessExclusiveLock); } + + /* the same for all inheritance parents */ + if (relOid != oldRelOid) + { + Relation inhrel; + SysScanDesc scan; + HeapTuple inhtup; + + inhrel = heap_open(InheritsRelationId, AccessShareLock); + scan = systable_beginscan(inhrel, InvalidOid, false, NULL, 0, NULL); + while (HeapTupleIsValid(inhtup = systable_getnext(scan))) + { + Form_pg_inherits inh = (Form_pg_inherits) GETSTRUCT(inhtup); + + if (inh->inhrelid == relOid && OidIsValid(inh->inhparent)) + LockRelationOid(inh->inhparent, AccessExclusiveLock); + } + systable_endscan(scan); + heap_close(inhrel, AccessShareLock); + }}/* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: