Re: partition routing layering in nodeModifyTable.c - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: partition routing layering in nodeModifyTable.c |
Date | |
Msg-id | CA+HiwqENbJMzoD1VAoNWFLcnam0zJS34Vy0o5AuhW-UVq1LV6A@mail.gmail.com Whole thread Raw |
In response to | Re: partition routing layering in nodeModifyTable.c (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: partition routing layering in nodeModifyTable.c
|
List | pgsql-hackers |
On Tue, Oct 13, 2020 at 1:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 12/10/2020 16:47, Amit Langote wrote: > > On Mon, Oct 12, 2020 at 8:12 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> 1. We have many different cleanup/close routines now: > >> ExecCloseResultRelations, ExecCloseRangeTableRelations, and > >> ExecCleanUpTriggerState. Do we need them all? It seems to me that we > >> could merge ExecCloseRangeTableRelations() and > >> ExecCleanUpTriggerState(), they seem to do roughly the same thing: close > >> relations that were opened for ResultRelInfos. They are always called > >> together, except in afterTriggerInvokeEvents(). And in > >> afterTriggerInvokeEvents() too, there would be no harm in doing both, > >> even though we know there aren't any entries in the es_result_relations > >> array at that point. > > > > Hmm, I find trigger result relations to behave differently enough to > > deserve a separate function. For example, unlike plan-specified > > result relations, they don't point to range table relations and don't > > open indices. Maybe the name could be revisited, say, > > ExecCloseTriggerResultRelations(). > > Matter of perception I guess. I still prefer to club them together into > one Close call. It's true that they're slightly different, but they're > also pretty similar. And IMHO they're more similar than different. Okay, fine with me. > > Also, maybe call the other functions: > > > > ExecInitPlanResultRelationsArray() > > ExecInitPlanResultRelation() > > ExecClosePlanResultRelations() > > > > Thoughts? > > Hmm. How about initializing the array lazily, on the first > ExecInitPlanResultRelation() call? It's not performance critical, and > that way there's one fewer initialization function that you need to > remember to call. Agree that's better. > It occurred to me that if we do that (initialize the array lazily), > there's very little need for the PlannedStmt->resultRelations list > anymore. It's only used in ExecRelationIsTargetRelation(), but if we > assume that ExecRelationIsTargetRelation() is only called after InitPlan > has initialized the result relation for the relation, it can easily > check es_result_relations instead. I think that's a safe assumption. > ExecRelationIsTargetRelation() is only used in FDWs, and I believe the > FDWs initialization routine can only be called after ExecInitModifyTable > has been called on the relation. > > The PlannedStmt->rootResultRelations field is even more useless. I am very much tempted to remove those fields from PlannedStmt, although I am concerned that the following now assumes that *all* result relations are initialized in the executor initialization phase: bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid) { if (!estate->es_result_relations) return false; return estate->es_result_relations[scanrelid - 1] != NULL; } In the other thread [1], I am proposing that we initialize result relations lazily, but the above will be a blocker to that. > > Actually, maybe we don't need to be so paranoid about setting up > > es_result_relations in worker.c, because none of the downstream > > functionality invoked seems to rely on it, that is, no need to call > > ExecInitResultRelationsArray() and ExecInitResultRelation(). > > ExecSimpleRelation* and downstream functionality assume a > > single-relation operation and the ResultRelInfo is explicitly passed. > > Hmm, yeah, I like that. Similarly in ExecuteTruncateGuts(), there isn't > actually any need to put the ResultRelInfos in the es_result_relations > array. > > Putting all this together, I ended up with the attached. It doesn't > include the subsequent commits in this patch set yet, for removal of > es_result_relation_info et al. Thanks. + * We put the ResultRelInfos in the es_opened_result_relations list, even + * though we don't have a range table and don't populate the + * es_result_relations array. That's a big bogus, but it's enough to make + * ExecGetTriggerResultRel() find them. */ estate = CreateExecutorState(); resultRelInfos = (ResultRelInfo *) palloc(list_length(rels) * sizeof(ResultRelInfo)); resultRelInfo = resultRelInfos; + estate->es_result_relations = (ResultRelInfo **) + palloc(list_length(rels) * sizeof(ResultRelInfo *)); Maybe don't allocate es_result_relations here? +/* + * Close all relations opened by ExecGetRangeTableRelation() + */ +void +ExecCloseRangeTableRelations(EState *estate) +{ + int i; + + for (i = 0; i < estate->es_range_table_size; i++) { if (estate->es_relations[i]) table_close(estate->es_relations[i], NoLock); } I think we have an optimization opportunity here (maybe as a separate patch). Why don't we introduce es_opened_relations? That way, if only a single or few of potentially 1000s relations in the range table is/are opened, we don't needlessly loop over *all* relations here. That can happen, for example, with a query where no partitions could be pruned at planning time, so the range table contains all partitions, but only one or few are accessed during execution and the rest run-time pruned. Although, in the workloads where it would matter, other overheads easily mask the overhead of this loop; see the first message at the linked thread [1], so it is hard to show an immediate benefit from this. Anyway, other than my concern about ExecRelationIsTargetRelation() mentioned above, I think the patch looks good. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://commitfest.postgresql.org/30/2621/
pgsql-hackers by date: