Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAJcOf-eiQ6zBF_pZ9tmieXvv_aK5QN4z=Cg5mSDpUA-W0FYUnQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
|
List | pgsql-hackers |
On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09@gmail.com> wrote: > > Greg, all > > Thanks a lot for your work on this. > > On Mon, Feb 8, 2021 at 3:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Posting an updated set of patches. > > I've been looking at these patches, initially with an intention to > review mainly any partitioning-related concerns, but have some general > thoughts as well concerning mostly the patches 0001 and 0002. > > * I've seen review comments on this thread where I think it's been > suggested that whatever max_parallel_hazard_for_modify() does had > better have been integrated into max_parallel_hazard() such that > there's no particular need for that function to exist. For example, > the following: > > + /* > + * UPDATE is not currently supported in parallel-mode, so prohibit > + * INSERT...ON CONFLICT...DO UPDATE... > + * In order to support update, even if only in the leader, some > + * further work would need to be done. A mechanism would be needed > + * for sharing combo-cids between leader and workers during > + * parallel-mode, since for example, the leader might generate a > + * combo-cid and it needs to be propagated to the workers. > + */ > + if (parse->onConflict != NULL && parse->onConflict->action == > ONCONFLICT_UPDATE) > + return PROPARALLEL_UNSAFE; > > could be placed in the following block in max_parallel_hazard(): > > /* > * When we're first invoked on a completely unplanned tree, we must > * recurse into subqueries so to as to locate parallel-unsafe constructs > * anywhere in the tree. > */ > else if (IsA(node, Query)) > { > Query *query = (Query *) node; > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > if (query->rowMarks != NULL) > { > context->max_hazard = PROPARALLEL_UNSAFE; > return true; > } > > Furthermore, the following: > > + rte = rt_fetch(parse->resultRelation, parse->rtable); > + > + /* > + * The target table is already locked by the caller (this is done in the > + * parse/analyze phase). > + */ > + rel = table_open(rte->relid, NoLock); > + (void) rel_max_parallel_hazard_for_modify(rel, parse->commandType, > &context); > + table_close(rel, NoLock); > > can itself be wrapped in a function that's called from > max_parallel_hazard() by adding a new block for RangeTblEntry nodes > and passing QTW_EXAMINE_RTES_BEFORE to query_tree_walker(). > Thanks, I think those suggestions look good to me. > That brings me to to this part of the hunk: > > + /* > + * If there is no underlying SELECT, a parallel table-modification > + * operation is not possible (nor desirable). > + */ > + hasSubQuery = false; > + foreach(lc, parse->rtable) > + { > + rte = lfirst_node(RangeTblEntry, lc); > + if (rte->rtekind == RTE_SUBQUERY) > + { > + hasSubQuery = true; > + break; > + } > + } > + if (!hasSubQuery) > + return PROPARALLEL_UNSAFE; > > The justification for this given in: > > https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com > > seems to be that the failure of a test case in > partition-concurrent-attach isolation suite is prevented if finding no > subquery RTEs in the query is flagged as parallel unsafe, which in > turn stops max_parallel_hazard_modify() from locking partitions for > safety checks in such cases. But it feels unprincipled to have this > code to work around a specific test case that's failing. I'd rather > edit the failing test case to disable parallel execution as > Tsunakawa-san suggested. > The code was not changed because of the test case (though it was fortunate that the test case worked after the change). The code check that you have identified above ensures that the INSERT has an underlying SELECT, because the planner won't (and shouldn't anyway) generate a parallel plan for INSERT...VALUES, so there is no point doing any parallel-safety checks in this case. It just so happens that the problem test case uses INSERT...VALUES - and it shouldn't have triggered the parallel-safety checks for parallel INSERT for this case anyway, because INSERT...VALUES can't (and shouldn't) be parallelized. So I will need to keep that check in the code somewhere, to avoid overhead of parallel-safety checks in the case of INSERT...VALUES. > * Regarding function names: > > +static bool trigger_max_parallel_hazard_for_modify(TriggerDesc *trigdesc, > + > max_parallel_hazard_context *context); > +static bool index_expr_max_parallel_hazard_for_modify(Relation rel, > + > max_parallel_hazard_context *context); > +static bool domain_max_parallel_hazard_for_modify(Oid typid, > max_parallel_hazard_context *context); > +static bool rel_max_parallel_hazard_for_modify(Relation rel, > + CmdType command_type, > + > max_parallel_hazard_context *context) > > IMO, it would be better to name these > target_rel_trigger_max_parallel_hazard(), > target_rel_index_max_parallel_hazard(), etc. rather than have > _for_modify at the end of these names to better connote that they > check the parallel safety of applying the modify operation to a given > target relation. Also, put these prototypes just below that of > max_parallel_hazard() to have related things close by. > > Attached please see v15_delta.diff showing the changes suggested above. > OK, sounds reasonable. Thanks for the patch! > * I suspect that the following is broken in light of concurrent > attachment of partitions. > > + > + /* Recursively check each partition ... */ > + pdesc = RelationGetPartitionDesc(rel); > > I think we'd need to use CreatePartitionDirectory() and retrieve the > PartitionDesc using PartitionDirectoryLookup(). Something we already > do when opening partitions for SELECT planning. > > * I think that the concerns raised by Tsunakawa-san in: > > https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com > > regarding how this interacts with plancache.c deserve a look. > Specifically, a plan that uses parallel insert may fail to be > invalidated when partitions are altered directly (that is without > altering their root parent). That would be because we are not adding > partition OIDs to PlannerGlobal.invalItems despite making a plan > that's based on checking their properties. See this (tested with all > patches applied!): > > create table rp (a int) partition by range (a); > create table rp1 partition of rp for values from (minvalue) to (0); > create table rp2 partition of rp for values from (0) to (maxvalue); > create table foo (a) as select generate_series(1, 1000000); > prepare q as insert into rp select * from foo where a%2 = 0; > explain execute q; > QUERY PLAN > ------------------------------------------------------------------------------- > Gather (cost=1000.00..13041.54 rows=5642 width=0) > Workers Planned: 2 > -> Insert on rp (cost=0.00..11477.34 rows=0 width=0) > -> Parallel Seq Scan on foo (cost=0.00..11477.34 rows=2351 width=4) > Filter: ((a % 2) = 0) > (5 rows) > > -- create a parallel unsafe trigger (that's actually marked so) > directly on a partition > create or replace function make_table () returns trigger language > plpgsql as $$ begin create table bar(); return null; end; $$ parallel > unsafe; > create trigger ai_rp2 after insert on rp2 for each row execute > function make_table();CREATE TRIGGER > > -- plan still parallel > explain execute q; > QUERY PLAN > ------------------------------------------------------------------------------- > Gather (cost=1000.00..13041.54 rows=5642 width=0) > Workers Planned: 2 > -> Insert on rp (cost=0.00..11477.34 rows=0 width=0) > -> Parallel Seq Scan on foo (cost=0.00..11477.34 rows=2351 width=4) > Filter: ((a % 2) = 0) > (5 rows) > > -- and because it is > execute q; > ERROR: cannot start commands during a parallel operation > CONTEXT: SQL statement "create table bar()" > PL/pgSQL function make_table() line 1 at SQL statement > > -- OTOH, altering parent correctly discards the parallel plan > create trigger ai_rp after insert on rp for each row execute function > make_table(); > explain execute q; > QUERY PLAN > ---------------------------------------------------------------- > Insert on rp (cost=0.00..19425.00 rows=0 width=0) > -> Seq Scan on foo (cost=0.00..19425.00 rows=5000 width=4) > Filter: ((a % 2) = 0) > (3 rows) > > It's fair to argue that it would rarely make sense to use PREPARE for > bulk loads, but we need to tighten things up a bit here regardless. > > Thanks, looks like you've identified some definite issues in the partition support and some missing test cases to help detect them. I'll look into it. Thanks very much for your review of this and suggestions. Regards, Greg Nancarrow Fujitsu Australia
pgsql-hackers by date: