RE: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Hou, Zhijie |
---|---|
Subject | RE: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | c9d78965120340d3a45df299d5b46915@G08CNEXMBPEKD05.g08.fujitsu.local Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
|
List | pgsql-hackers |
> > > > > > else if (IsA(node, Query)) > > > { > > > Query *query = (Query *) node; > > > > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > > > if (query->rowMarks != NULL) > > > { > > > context->max_hazard = PROPARALLEL_UNSAFE; > > > return true; > > > } > > > > In my understanding, the max_parallel_hazard() query tree walk is to > > find constructs that are parallel unsafe in that they call code that > > can't run in parallel mode. For example, FOR UPDATE/SHARE on > > traditional heap AM tuples calls AssignTransactionId() which doesn't > > support being called in parallel mode. Likewise ON CONFLICT ... DO > > UPDATE calls heap_update() which doesn't support parallelism. I'm not > > aware of any such hazards downstream of ExecValuesScan(). > > > > > >You're trying to > > > > add something that bails based on second-guessing that a parallel > > > >path would not be chosen, which I find somewhat objectionable. > > > > > > > > If the main goal of bailing out is to avoid doing the potentially > > > > expensive modification safety check on the target relation, maybe > > > > we should try to somehow make the check less expensive. I > > > > remember reading somewhere in the thread about caching the result > > > > of this check in relcache, but haven't closely studied the feasibility > of doing so. > > > > > > > > > > There's no "second-guessing" involved here. > > > There is no underlying way of dividing up the VALUES data of > > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > > was updated to produce a parallel-plan for the "INSERT...VALUES" > > > case (apart from the fact that spawning off parallel workers to > > > insert that data would almost always result in worse performance > > > than a non-parallel plan...) The division of work for parallel > > > workers is part of the table AM > > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > > > I don't disagree that the planner would not normally assign a parallel > > path simply to pull values out of a VALUES list mentioned in the > > INSERT command, but deciding something based on the certainty of it in > > an earlier planning phase seems odd to me. Maybe that's just me > > though. > > > > I think it is more of a case where neither is a need for parallelism nor > we want to support parallelism of it. The other possibility for such a check > could be at some earlier phase say in standard_planner [1] where we are > doing checks for other constructs where we don't want parallelism (I think > the check for 'parse->hasModifyingCTE' is quite similar). If you see in > that check as well we just assume other operations to be in the category > of parallel-unsafe. I think we should rule out such cases earlier than later. > Do you have better ideas than what Greg has done to avoid parallelism for > such cases? > > [1] - > standard_planner() > { > .. > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && > parse->commandType == CMD_SELECT && > !parse->hasModifyingCTE && > max_parallel_workers_per_gather > 0 && > !IsParallelWorker()) > { > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); parallelModeOK = > glob->(glob->maxParallelHazard != PROPARALLEL_UNSAFE); > } > else > { > /* skip the query tree scan, just assume it's unsafe */ > glob->maxParallelHazard = PROPARALLEL_UNSAFE; parallelModeOK = false; > } +1. In the current parallel_dml option patch. I put this check and some high-level check in a separate function called is_parallel_possible_for_modify. - * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE + * Check at a high-level if parallel mode is able to be used for the specified + * table-modification statement. + * It's not possible in the following cases: + * + * 1) enable_parallel_dml is off + * 2) UPDATE or DELETE command + * 3) INSERT...ON CONFLICT...DO UPDATE + * 4) INSERT without SELECT on a relation + * 5) the reloption parallel_dml_enabled is not set for the target table + * + * (Note: we don't do in-depth parallel-safety checks here, we do only the + * cheaper tests that can quickly exclude obvious cases for which + * parallelism isn't supported, to avoid having to do further parallel-safety + * checks for these) */ +bool +is_parallel_possible_for_modify(Query *parse) And I put the function at earlier place like the following: if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && (parse->commandType == CMD_SELECT || - (enable_parallel_dml && - IsModifySupportedInParallelMode(parse->commandType))) && + is_parallel_possible_for_modify(parse)) && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker()) If this looks good, maybe we can merge this change. Best regards, houzj
pgsql-hackers by date: