Re: [HACKERS] Partitioned tables and relfilenode - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: [HACKERS] Partitioned tables and relfilenode |
Date | |
Msg-id | a361f6c9-7262-1ad7-02a8-81014e36dde6@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] Partitioned tables and relfilenode (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] Partitioned tables and relfilenode
Re: [HACKERS] Partitioned tables and relfilenode |
List | pgsql-hackers |
On 2017/03/10 17:57, Amit Langote wrote: > On 2017/03/08 22:36, Robert Haas wrote: >> On Wed, Mar 8, 2017 at 5:36 AM, Amit Langote wrote: >>>> - rel = mtstate->resultRelInfo->ri_RelationDesc; >>>> + nominalRTE = rt_fetch(node->nominalRelation, estate->es_range_table); >>>> + nominalRel = heap_open(nominalRTE->relid, NoLock); >>>> >>>> No lock? >>> >>> Hmm, I think I missed that a partitioned parent table would not be locked >>> by the *executor* at all after applying this patch. As of now, InitPlan() >>> takes care of locking all the result relations in the >>> PlannedStmt.resultRelations list. This patch removes partitioned >>> parent(s) from appearing in this list. But that makes me wonder - aren't >>> the locks taken by expand_inherited_rtentry() kept long enough? Why does >>> InitPlan need to acquire them again? I see this comment in >>> expand_inherited_rtentry: >> >> Parse-time locks, plan-time locks, and execution-time locks are all >> separate. See the header comments for AcquireRewriteLocks in >> rewriteHandler.c; think about a view, where the parsed query has been >> stored and is later rewritten and planned. Similarly, a plan can be >> reused even after the transaction that created that plan has >> committed; see AcquireExecutorLocks in plancache.c. > > Thanks for those references. > > I took a step back here and thought a bit more about the implications this > patch. It occurred to me that the complete absence of partitioned table > RT entries in the plan-tree has more consequences than I originally > imagined. I will post an updated patch by Monday latest. Here is the updated patch. Since this patch proposes to avoid creating scan nodes for non-leaf tables in a partition tree, they won't be referenced anywhere in the resulting plan tree. So the executor will not lock those tables in the select/update/delete cases. Insert is different since we lock all tables in the partition tree when setting up tuple-routing in ExecInitModifyTable. Not taking executor locks on the tables means that the cached plans that should be invalidated upon adding/removing a partition somewhere in the partition tree won't be. First I thought that we could remember just the root table RT index using a new Index field partitionRoot in Append, MergeAppend, and ModifyTable nodes and use it to locate and lock the root table during executor node initialization. But soon realized that that's not enough, because it ignores the fact that adding/removing partitions at lower levels does not require taking a lock on the root table; only the immediate parent. So any cached select/update/delete plans won't be invalidated when a new lower-level partition is added/removed, because the immediate parent would not have been added to the query's range table and hence the PlannedStmt.relationOids list. Remember that the latter is used by plancache.c to remember the relations that a given cached plan depends on remaining unchanged. So the patch now adds a list member called partitioned_rels to Append, MergeAppend, and ModifyTable nodes and stores the RT indexes of all the non-leaf tables in a partition tree with root table RT index at the head (note that these RT indexes are of the RTEs added by expand_inherited_rtenrty; also see below). Since the partitioned_rels list is constructed when building paths and must be propagated to the plan nodes, the same field is also present in the corresponding Path nodes. ExecInit* routines for the aforementioned nodes now locate and lock the non-leaf tables using the RT indexes in partitioned_rels. Leaf tables are locked, as before, either by InitPlan (update/delete result relations case) or by ExecInitAppend or ExecInitMergeAppend when initializing the appendplans/mergeplans (select case). The previous proposal was for expand_inherited_rtentry to not create RT entries and AppendRelInfo's for the non-leaf tables, but I think that doesn't work, as I tried to explain above. We need RTEs because that seems to be the only way currently for informing the executor of the non-leaf tables. We need AppendRelInfo's to store the RT indexes of those RTEs for the latter planning steps to collect them in partitioned_rels mentioned above. So with the latest patch, we do create the RT entry and AppendRelInfo for non-leaf tables. AppendRelInfo created in this case is a minimal one; only parent_relid and child_relid are valid. To make the latter planning steps ignore these minimal AppendRelInfo's, every AppendRelInfo is now marked with child_relkind. Only set_append_rel_pathlist() and inheritance_planner() process them to collect the child_relid into the partitioned_rels list to be stored in AppendPath/MergeAppendPath and ModifyTablePath, respectively. Thoughts? Thanks, Amit
pgsql-hackers by date: