Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CALDaNm3SG=D=2UHtOtzAG8i+AbUxsX6Z371c-bJ-X9h7xw5CYg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel INSERT (INTO ... SELECT ...) (Greg Nancarrow <gregn4422@gmail.com>) |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
Re: Parallel INSERT (INTO ... SELECT ...) |
List | pgsql-hackers |
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Posting an updated set of patches to address recent feedback: > > - Removed conditional-locking code used in parallel-safety checking > code (Tsunakawa-san feedback). It turns out that for the problem test > case, no parallel-safety checking should be occurring that locks > relations because those inserts are specifying VALUES, not an > underlying SELECT, so the parallel-safety checking code was updated to > bail out early if no underlying SELECT is specified for the INSERT. No > change to the test code was required. > - Added comment to better explain the reason for treating "INSERT ... > ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip) > - Added assertion to heap_prepare_insert() (Amit) > - Updated error handling for NULL index_expr_item case (Vignesh) Thanks Greg for fixing and posting a new patch. Few comments: +-- Test INSERT with underlying query. +-- (should create plan with parallel SELECT, Gather parent node) +-- +explain(costs off) insert into para_insert_p1 select unique1, stringu1 from tenk1; + QUERY PLAN +---------------------------------------- + Insert on para_insert_p1 + -> Gather + Workers Planned: 4 + -> Parallel Seq Scan on tenk1 +(4 rows) + +insert into para_insert_p1 select unique1, stringu1 from tenk1; +select count(*), sum(unique1) from para_insert_p1; + count | sum +-------+---------- + 10000 | 49995000 +(1 row) + For one of the test you can validate that the same transaction has been used by all the parallel workers, you could use something like below to validate: SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM para_insert_p1) as dt; Few includes are not required: #include "executor/nodeGather.h" +#include "executor/nodeModifyTable.h" #include "executor/nodeSubplan.h" #include "executor/tqueue.h" #include "miscadmin.h" @@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags) GatherState *gatherstate; Plan *outerNode; TupleDesc tupDesc; + Index varno; This include is not required in nodeModifyTable.c +#include "catalog/index.h" +#include "catalog/indexing.h" @@ -43,7 +49,11 @@ #include "parser/parse_agg.h" #include "parser/parse_coerce.h" #include "parser/parse_func.h" +#include "parser/parsetree.h" +#include "partitioning/partdesc.h" +#include "rewrite/rewriteHandler.h" #include "rewrite/rewriteManip.h" +#include "storage/lmgr.h" #include "tcop/tcopprot.h" The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c There are few typos: + table and populate it can use a parallel plan. Another exeption is the command + <literal>INSERT INTO ... SELECT ...</literal> which can use a parallel plan for + the underlying <literal>SELECT</literal> part of the query. exeption should be exception + /* + * For the number of workers to use for a parallel + * INSERT/UPDATE/DELETE, it seems resonable to use the same number + * of workers as estimated for the underlying query. + */ + parallelModifyWorkers = path->parallel_workers; resonable should be reasonable Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: