Re: [HACKERS] Partitioned tables and relfilenode - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: [HACKERS] Partitioned tables and relfilenode |
Date | |
Msg-id | a4e2be66-67c6-c8c0-080b-1af22a14b566@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] Partitioned tables and relfilenode (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] Partitioned tables and relfilenode
|
List | pgsql-hackers |
On 2017/03/13 19:24, Amit Langote wrote: > On 2017/03/10 17:57, Amit Langote wrote: >> On 2017/03/08 22:36, Robert Haas wrote: >>> On Wed, Mar 8, 2017 at 5:36 AM, Amit Langote wrote: >>>>> - rel = mtstate->resultRelInfo->ri_RelationDesc; >>>>> + nominalRTE = rt_fetch(node->nominalRelation, estate->es_range_table); >>>>> + nominalRel = heap_open(nominalRTE->relid, NoLock); >>>>> >>>>> No lock? >>>> >>>> Hmm, I think I missed that a partitioned parent table would not be locked >>>> by the *executor* at all after applying this patch. As of now, InitPlan() >>>> takes care of locking all the result relations in the >>>> PlannedStmt.resultRelations list. This patch removes partitioned >>>> parent(s) from appearing in this list. But that makes me wonder - aren't >>>> the locks taken by expand_inherited_rtentry() kept long enough? Why does >>>> InitPlan need to acquire them again? I see this comment in >>>> expand_inherited_rtentry: >>> >>> Parse-time locks, plan-time locks, and execution-time locks are all >>> separate. See the header comments for AcquireRewriteLocks in >>> rewriteHandler.c; think about a view, where the parsed query has been >>> stored and is later rewritten and planned. Similarly, a plan can be >>> reused even after the transaction that created that plan has >>> committed; see AcquireExecutorLocks in plancache.c. >> >> Thanks for those references. >> >> I took a step back here and thought a bit more about the implications this >> patch. It occurred to me that the complete absence of partitioned table >> RT entries in the plan-tree has more consequences than I originally >> imagined. I will post an updated patch by Monday latest. > > Here is the updated patch. > > Since this patch proposes to avoid creating scan nodes for non-leaf tables > in a partition tree, they won't be referenced anywhere in the resulting > plan tree. So the executor will not lock those tables in the > select/update/delete cases. Insert is different since we lock all tables > in the partition tree when setting up tuple-routing in > ExecInitModifyTable. Not taking executor locks on the tables means that > the cached plans that should be invalidated upon adding/removing a > partition somewhere in the partition tree won't be. > > First I thought that we could remember just the root table RT index using > a new Index field partitionRoot in Append, MergeAppend, and ModifyTable > nodes and use it to locate and lock the root table during executor node > initialization. But soon realized that that's not enough, because it > ignores the fact that adding/removing partitions at lower levels does not > require taking a lock on the root table; only the immediate parent. So > any cached select/update/delete plans won't be invalidated when a new > lower-level partition is added/removed, because the immediate parent would > not have been added to the query's range table and hence the > PlannedStmt.relationOids list. Remember that the latter is used by > plancache.c to remember the relations that a given cached plan depends on > remaining unchanged. So the patch now adds a list member called > partitioned_rels to Append, MergeAppend, and ModifyTable nodes and stores > the RT indexes of all the non-leaf tables in a partition tree with root > table RT index at the head (note that these RT indexes are of the RTEs > added by expand_inherited_rtenrty; also see below). Since the > partitioned_rels list is constructed when building paths and must be > propagated to the plan nodes, the same field is also present in the > corresponding Path nodes. ExecInit* routines for the aforementioned nodes > now locate and lock the non-leaf tables using the RT indexes in > partitioned_rels. Leaf tables are locked, as before, either by InitPlan > (update/delete result relations case) or by ExecInitAppend or > ExecInitMergeAppend when initializing the appendplans/mergeplans (select > case). > > The previous proposal was for expand_inherited_rtentry to not create RT > entries and AppendRelInfo's for the non-leaf tables, but I think that > doesn't work, as I tried to explain above. We need RTEs because that > seems to be the only way currently for informing the executor of the > non-leaf tables. We need AppendRelInfo's to store the RT indexes of those > RTEs for the latter planning steps to collect them in partitioned_rels > mentioned above. So with the latest patch, we do create the RT entry and > AppendRelInfo for non-leaf tables. AppendRelInfo created in this case is > a minimal one; only parent_relid and child_relid are valid. To make the > latter planning steps ignore these minimal AppendRelInfo's, every > AppendRelInfo is now marked with child_relkind. Only > set_append_rel_pathlist() and inheritance_planner() process them to > collect the child_relid into the partitioned_rels list to be stored in > AppendPath/MergeAppendPath and ModifyTablePath, respectively. Sorry, forgot to attach the patches themselves. Attached this time. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: