Re: partition routing layering in nodeModifyTable.c - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: partition routing layering in nodeModifyTable.c |
Date | |
Msg-id | CA+HiwqFDGhGT8AXUQ4S5ytzuLNm18Qbu1js7XG3eYk5h-zhxmw@mail.gmail.com Whole thread Raw |
In response to | Re: partition routing layering in nodeModifyTable.c (Andres Freund <andres@anarazel.de>) |
Responses |
Re: partition routing layering in nodeModifyTable.c
|
List | pgsql-hackers |
Hi Andres, Sorry about the delay in replying as I was on vacation for the last few days. On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote: > > The first one (0001) deals with reducing the core executor's reliance > > on es_result_relation_info to access the currently active result > > relation, in favor of receiving it from the caller as a function > > argument. So no piece of core code relies on it being correctly set > > anymore. It still needs to be set correctly for the third-party code > > such as FDWs. > > I'm inclined to just remove it. There's not much code out there relying > on it, as far as I can tell. Most FDWs don't support the direct modify > API, and that's afaict the case where we one needs to use > es_result_relation_info? Right, only the directly modify API uses it. > In fact, I searched through alllFDWs listed on https://wiki.postgresql.org/wiki/Foreign_data_wrappers > that are on github and in first few categories (up and including to > "file wrappers"), and there was only one reference to > es_result_relation_info, and that just in comments in a test: > https://github.com/pgspider/griddb_fdw/search?utf8=%E2%9C%93&q=es_result_relation_info&type= > which I think was just copied from our source code. > > IOW, we should just change the direct modify calls to get the relevant > ResultRelationInfo or something in that vein (perhaps just the relevant > RT index?). It seems easy to make one of the two functions that constitute the direct modify API, IterateDirectModify(), access the result relation from ForeignScanState by saving either the result relation RT index or ResultRelInfo pointer itself into the ForeignScanState's FDW-private area. For example, for postgres_fdw, one would simply add a new member to PgFdwDirectModifyState struct. Doing that for the other function BeginDirectModify() seems a bit more involved. We could add a new field to ForeignScan, say resultRelation, that's set by either PlanDirectModify() (the FDW code) or make_modifytable() (the core code) if the ForeignScan node contains the command for direct modification. BeginDirectModify() can then use that value instead of relying on es_result_relation_info being set. Thoughts? Fujita-san, do you have any opinion on whether that would be a good idea? > pglogical also references it, but just because it creates its own > EState afaict. That sounds easily manageable. > > @@ -334,32 +335,50 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot) > > * ExecInsert > > * > > * For INSERT, we have to insert the tuple into the target relation > > - * and insert appropriate tuples into the index relations. > > + * (or partition thereof) and insert appropriate tuples into the index > > + * relations. > > * > > * Returns RETURNING result if any, otherwise NULL. > > + * > > + * This may change the currently active tuple conversion map in > > + * mtstate->mt_transition_capture, so the callers must take care to > > + * save the previous value to avoid losing track of it. > > * ---------------------------------------------------------------- > > */ > > static TupleTableSlot * > > ExecInsert(ModifyTableState *mtstate, > > + ResultRelInfo *resultRelInfo, > > TupleTableSlot *slot, > > TupleTableSlot *planSlot, > > EState *estate, > > bool canSetTag) > > { > > - ResultRelInfo *resultRelInfo; > > Relation resultRelationDesc; > > List *recheckIndexes = NIL; > > TupleTableSlot *result = NULL; > > TransitionCaptureState *ar_insert_trig_tcs; > > ModifyTable *node = (ModifyTable *) mtstate->ps.plan; > > OnConflictAction onconflict = node->onConflictAction; > > + PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing; > > + > > + /* > > + * If the input result relation is a partitioned table, find the leaf > > + * partition to insert the tuple into. > > + */ > > + if (proute) > > + { > > + ResultRelInfo *partRelInfo; > > + > > + slot = ExecPrepareTupleRouting(mtstate, estate, proute, > > + resultRelInfo, slot, > > + &partRelInfo); > > + resultRelInfo = partRelInfo; > > + /* Result relation has changed, so update EState reference too. */ > > + estate->es_result_relation_info = resultRelInfo; > > + } > > I think by removing es_result_relation entirely, this would look > cleaner. I agree. Maybe, setting es_result_relation_info here isn't really needed, because the ResultRelInfo is directly passed through ExecForeignInsert. Still, some FDWs may be relying on es_result_relation_info being correctly set despite the aforementioned. Again, the only way to get them to stop doing so may be to remove it. > > @@ -1271,18 +1274,18 @@ lreplace:; > > mtstate->mt_root_tuple_slot); > > > > /* > > - * Prepare for tuple routing, making it look like we're inserting > > - * into the root. > > + * ExecInsert() may scribble on mtstate->mt_transition_capture, > > + * so save the currently active map. > > */ > > + if (mtstate->mt_transition_capture) > > + saved_tcs_map = mtstate->mt_transition_capture->tcs_map; > > Wonder if we could remove the need for this somehow, it's still pretty > darn ugly. Thomas, perhaps you have some insights? > > To me the need to modify these ModifyTable wide state on a per-subplan > and even per-partition basis indicates that the datastructures are in > the wrong place. I agree that having to ensure tcs_map is set correctly is cumbersome, because it has to be reset every time the currently active result relation changes. I think a better place for the map to be is ResultRelInfo itself. The trigger code can just get the correct map from the ResultRelInfo of the result relation it's processing. Regarding that idea, the necessary map is already present in the tuple-routing state struct that's embedded in the partition's ResultRelInfo. But the UPDATE result relations that are never processed as tuple routing targets don't have routing info initialized (also think non-partition inheritance children), so we could add another TupleConversionMap * field in ResultRelInfo. Attached patch 0003 implements that. With this change, we no longer need to track the map in a global variable, that is, TransitionCaptureState no longer needs tcs_map. We still have tcs_original_insert_tuple though, which must be set during ExecInsert and reset after it's read by AfterTriggerSaveEvent. I have moved the resetting of its value to right after where the originally set value is read to make it clear that the value must be read only once. > > @@ -2212,23 +2207,17 @@ ExecModifyTable(PlanState *pstate) > > switch (operation) > > { > > case CMD_INSERT: > > - /* Prepare for tuple routing if needed. */ > > - if (proute) > > - slot = ExecPrepareTupleRouting(node, estate, proute, > > - resultRelInfo, slot); > > - slot = ExecInsert(node, slot, planSlot, > > + slot = ExecInsert(node, resultRelInfo, slot, planSlot, > > estate, node->canSetTag); > > - /* Revert ExecPrepareTupleRouting's state change. */ > > - if (proute) > > - estate->es_result_relation_info = resultRelInfo; > > break; > > case CMD_UPDATE: > > - slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot, > > - &node->mt_epqstate, estate, node->canSetTag); > > + slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot, > > + planSlot, &node->mt_epqstate, estate, > > + node->canSetTag); > > break; > > case CMD_DELETE: > > - slot = ExecDelete(node, tupleid, oldtuple, planSlot, > > - &node->mt_epqstate, estate, > > + slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple, > > + planSlot, &node->mt_epqstate, estate, > > true, node->canSetTag, > > false /* changingPart */ , NULL, NULL); > > break; > > This reminds me of another complaint: ExecDelete and ExecInsert() have > gotten more boolean parameters for partition moving, but only one of > them is explained with a comment (/* changingPart */) - think we should > do that for all. Agree about the confusing state of ExecDelete call sites. I've reformatted the calls to properly label the arguments (the changes are contained in the revised 0001). I don't see many partitioning-specific boolean parameters in ExecInsert though. > > diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c > > index 43edfef089..7df3e78b22 100644 > > --- a/src/backend/replication/logical/worker.c > > +++ b/src/backend/replication/logical/worker.c > > @@ -173,10 +173,10 @@ ensure_transaction(void) > > * This is based on similar code in copy.c > > */ > > static EState * > > -create_estate_for_relation(LogicalRepRelMapEntry *rel) > > +create_estate_for_relation(LogicalRepRelMapEntry *rel, > > + ResultRelInfo **resultRelInfo) > > { > > EState *estate; > > - ResultRelInfo *resultRelInfo; > > RangeTblEntry *rte; > > > > estate = CreateExecutorState(); > > @@ -188,12 +188,11 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel) > > rte->rellockmode = AccessShareLock; > > ExecInitRangeTable(estate, list_make1(rte)); > > > > - resultRelInfo = makeNode(ResultRelInfo); > > - InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0); > > + *resultRelInfo = makeNode(ResultRelInfo); > > + InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0); > > > > - estate->es_result_relations = resultRelInfo; > > + estate->es_result_relations = *resultRelInfo; > > estate->es_num_result_relations = 1; > > - estate->es_result_relation_info = resultRelInfo; > > > > estate->es_output_cid = GetCurrentCommandId(true); > > > > @@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel) > > static void > > apply_handle_insert(StringInfo s) > > { > > + ResultRelInfo *resultRelInfo; > > LogicalRepRelMapEntry *rel; > > LogicalRepTupleData newtup; > > LogicalRepRelId relid; > > @@ -589,7 +589,7 @@ apply_handle_insert(StringInfo s) > > } > > > > /* Initialize the executor state. */ > > - estate = create_estate_for_relation(rel); > > + estate = create_estate_for_relation(rel, &resultRelInfo); > > Hm. It kinda seems cleaner if we were to instead return the relevant > index, rather than the entire ResultRelInfo, as an output from > create_estate_for_relation(). Makes it clearer that it's still in the > EState. For now, I've reverted these changes in favor of just doing this: /* Initialize the executor state. */ estate = create_estate_for_relation(rel); + resultRelInfo = &estate->es_result_relations[0]; This seems OK as we know for sure that there is only one target relation. > Or perhaps we ought to compute it in a separate step? Then that'd be > more amenable to support replcating into partition roots. If we think of create_estate_for_relation() being like InitPlan(), then perhaps it makes sense to leave it as is. Any setup needed for replicating into partition roots will have to be in a separate function anyway. > > +/* > > + * ExecMove > > + * Move an updated tuple from the input result relation to the > > + * new partition of its root parent table > > + * > > + * This works by first deleting the tuple from the input result relation > > + * followed by inserting it into the root parent table, that is, > > + * mtstate->rootResultRelInfo. > > + * > > + * Returns true if it's detected that the tuple we're trying to move has > > + * been concurrently updated. > > + */ > > +static bool > > +ExecMove(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, > > + ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *planSlot, > > + EPQState *epqstate, bool canSetTag, TupleTableSlot **slot, > > + TupleTableSlot **inserted_tuple) > > +{ > > I know that it was one of the names I proposed, but now that I'm > thinking about it again, it sounds too generic. Perhaps > ExecCrossPartitionUpdate() wouldn't be a quite so generic name? Since > there's only one reference the longer name wouldn't be painful. OK, I've renamed ExecMove to ExecCrossPartitionUpdate. > > + /* > > + * Row movement, part 1. Delete the tuple, but skip RETURNING > > + * processing. We want to return rows from INSERT. > > + */ > > + ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot, > > + epqstate, estate, false, false /* canSetTag */ , > > + true /* changingPart */ , &tuple_deleted, &epqslot); > > Here again it'd be nice if all the booleans would be explained with a > comment. Done too. Attached updated 0001, 0002, and the new 0003 for transition tuple conversion map related refactoring as explained above. Thanks, Amit
Attachment
pgsql-hackers by date: