Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CALDaNm0ppjsA_EefjK3VFX6DsfJf9r-6_Z+QbS3jgViHp4CQjg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Parallel Inserts in CREATE TABLE AS
|
List | pgsql-hackers |
On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21@gmail.com> wrote: > > Few comments: > > - /* > > - * To allow parallel inserts, we need to ensure that they are safe to be > > - * performed in workers. We have the infrastructure to allow parallel > > - * inserts in general except for the cases where inserts generate a new > > - * CommandId (eg. inserts into a table having a foreign key column). > > - */ > > - if (IsParallelWorker()) > > - ereport(ERROR, > > - (errcode(ERRCODE_INVALID_TRANSACTION_STATE), > > - errmsg("cannot insert tuples in a > > parallel worker"))); > > > > Is it possible to add a check if it is a CTAS insert here as we do not > > support insert in parallel workers from others as of now. > > Currently, there's no global variable in which we can selectively skip > this in case of parallel insertion in CTAS. How about having a > variable in any of the worker global contexts, set that when parallel > insertion is chosen for CTAS and use that in heap_prepare_insert() to > skip the above error? Eventually, we can remove this restriction > entirely in case we fully allow parallelism for INSERT INTO SELECT, > CTAS, and COPY. > > Thoughts? Yes, I felt that the leader can store the command as CTAS and the leader/worker can use it to check and throw an error. The similar change can be used for the parallel insert patches and once all the patches are committed, we can remove it eventually. > > > + Oid objectid; /* workers to > > open relation/table. */ > > + /* Number of tuples inserted by all the workers. */ > > + pg_atomic_uint64 processed; > > > > We can just mention relation instead of relation/table. > > I will modify it in the next patch set. > > > +select explain_pictas( > > +'create table parallel_write as select length(stringu1) from tenk1;'); > > + explain_pictas > > +---------------------------------------------------------- > > + Gather (actual rows=N loops=N) > > + Workers Planned: 4 > > + Workers Launched: N > > + -> Create parallel_write > > + -> Parallel Seq Scan on tenk1 (actual rows=N loops=N) > > +(5 rows) > > + > > +select count(*) from parallel_write; > > > > Can we include selection of cmin, xmin for one of the test to verify > > that it uses the same transaction id in the parallel workers > > something like: > > select distinct(cmin,xmin) from parallel_write; > > This is not possible since cmin and xmin are dynamic, we can not use > them in test cases. I think it's not necessary to check whether the > leader and workers are in the same txn or not, since we are not > creating a new txn. All the txn state from the leader is serialized in > SerializeTransactionState and restored in > StartParallelWorkerTransaction. > I had seen in your patch that you serialize and use the same transaction, but it will be good if you can have at least one test case to validate that the leader and worker both use the same transaction. To solve the problem that you are facing where cmin and xmin are dynamic, you can check the distinct count by using something like below: SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM t1) as dt; Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: